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THEORIES OF VIGILANCE! 
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Vigilance research concerns the 
attentiveness of the subject and his 
capability for detecting changes in 
stimulus events over relatively long 
periods of sustained observation. 
Interest in this topic has accelerated 
rapidly and the volume of experi- 
mental findings has increased steadily 
in recent years. With investigators 
spread over several continents and 
publishing under the sponsorship of 
numerous military, industrial, and 
academic organizations, it has be- 
come a major problem to keep up 
with the technical literature. This re- 
port is a critical survey of the existing 
literature, with emphasis being given 
to the organization of experimental 
results under the several theoretical 
hypotheses which have been ad- 
vanced in explanation of the findings. 
The many diverse sources of techni- 
cal papers made complete coverage of 
the literature difficult, but it is be- 
lieved that only a small fraction of the 
papers relevant to contemporary 
theories were unavailable. 

The increased number and pro- 
ductivity of researchers has been 
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associated with a greater variety 
of experimental situations. Those 
covered in this paper generally are 
one of four types: (a) Classical vigi- 
lance tasks, e.g., the Mackworth 
Clock Test (Mackworth, 1950). Near- 
threshold transient critical signals 
are randomly presented against a 
background of neutral signals. (0d) 
Multiple display situations where a 
critical signal could occur at any one 
of several stimulus sources, e.g., 
Broadbent’s Twenty Dials’ Test 
(Broadbent, 1950). Constant scan- 
ning of the several stimulus sources is 
required. (c) Threshold measure- 
ment, e.g., Bakan (1955). A train of 
signals is presented, starting at ran- 
dom intervals in time, with an in- 
tensity increment at each step until 
the observer detects the signal. (d) 
Observing response experiments, e.g., 
Holland (1957, 1958), where visual 
attending is measured indirectly 
through some other response that sug- 
gests observing of the stimulus dis- 
play. Frequency of observing is then 
related to percentage detection of the 
critical signal occurring on the dis- 
play. 


Although many experiments were 
generated by specific practical ques- 
tions, a framework for organizing the 
accumulation of empirical findings 
has not been neglected. It is now pos- 


sible to distinguish a number of 
explanatory systems. The main pur- 
pose of this paper is to review these 
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models and discuss them in terms of 
their effectiveness in accounting for 
the empirical findings. 


INHIBITION 


Mackworth (1950) advanced the 
first comprehensive interpretation of 
vigilance behavior relating observed 
phenomena of watch-keeping to prin- 
ciples of Pavlovian classical condi- 
tioning. In the same report Mack- 
worth presented extensive data on 
deterioration of criterion perform- 
ance on a number of different tasks 
under conditions of prolonged moni- 
toring—the Clock Test, the Syn- 
thetic Radar Test, and the Auditory 
Listening Test. The Clock Test, used 
most extensively, had a blank circular 
face with a hand that moved one step 
each second. Occasionally the hand 
moved in a double step, according to 
a prearranged schedule, and this was 
the critical signal to be detected and 
reported by pressing a response key. 
Over a 2-hour observation period it 

yas found that the percentage of 
critical signals detected was a de- 
creasing negatively accelerated func- 
tion of time, with the greatest drop 
occurring during the first half-hour. 
The analogy drawn with classical 
conditioning was that original condi- 
tioning took place in the demonstra- 
tion period where the conditioned 
stimulus was the double jump of the 
clock hand and the unconditioned 
stimulus was the experimenter’s in- 
forming comment ‘‘Now!”’ The con- 
ditioned voluntary response was the 
subject’s pressing the key to the 
double jump. Knowledge of results is 
a reinforcing state of affairs. The 2- 
hour observation period then, was 
considered an extinction period where 
the unconditioned stimulus and rein- 
forcement provided by the experi- 
menter was absent. During extinc- 
tion the percentage of detections 
declined, and Mackworth attributed 
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this decline to the growth of internal 
inhibition. Other evidence for inter- 
preting vigilance data in terms of 
Pavlovian conditioning was the tem- 
porary but complete restoration of 
initial performance level by the oc- 
currence of a telephone message to 
the observer in the middle of the 
watch-keeping session. Mackworth 
viewed this as an instance of disin- 
hibition where an alien stimulus pro- 
duced a temporary increase in re- 
sponsiveness. Other evidence for a 
classical conditioning interpretation 
occurred in an experiment when the 
experimenter provided knowledge of 
results after each double jump of the 
clock hand and prevented the occur- 
rence of a decrement in detection. 
This, within Mackworth’s explana- 
tory frame of reference, would be a 
reinforcing operation and would be 
expected to keep the performance 
level high. 

Mackworth was obliged to qualify 
a strict interpretation in terms of 
classical conditioning because he was 
unable to obtain anything near com- 
plete failure of responding, i.e., total 
experimental extinction. In fact, for 
the Clock Test, the level of detection 
for the critical signal ordinarily stabi- 
lized at about 70-75%. A state of 
expectancy and self-instructions were 
hypothesized as partly replacing the 
unconditioned stimulus and its rein- 
forcing function. 

The inhibition analysis of vigilance 
behavior came with a ready-made 
development so that the theorist’s 
task has been one of coordinating 
aspects of vigilance behavior with 
conditioning phenomena. From the 
standpoint of handling more recent 
results, inhibition does not fare very 
well. For example, a high frequency 
of signals should result in a greater 
vigilance decrement than low fre- 
quency signals because, within the 
classical conditioning framework of 
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the inhibition hypothesis, it represents 
a relatively high frequency of ex- 
tinction trials. Yet, Deese and 
Ormond (1953). and Jenkins (1958) 
show the opposite to be true. Mack- 
worth (1957) himself has come to 
regard the expectancy explanation 
as more important in accounting for 
recent findings, but he was not ready 
to dispense with reinforcement and 
nonreinforcement effects completely. 
Although the inhibition hypothesis 
plausibly accounted for many of the 
observed results of Mackworth’s ex- 
periments, it never gained wide ac- 
ceptance. Mackworth’s research 
rather than his interpretation has 
been responsible for generating new 
experiments. Reluctance to accept 
the inhibition explanation also has 
been based on attitudes towards 
theory construction. For example, 


Deese (1955, p. 366) felt it unneces- 
sary to postulate separate inhibitory 
and excitatory processes when a single 


state of vigilance which declines 
under specified conditions will handle 
the data as well. 


ATTENTION 


Broadbent (1953) carried the anal- 
ogy of watch-keeping to Pavlovian 
conditioning even further than Mack- 
worth. However, instead of inter- 
preting vigilance in terms of condi- 
tioning he interpreted both in terms 
of attention. He contends that the 
organism will select stimulus subsets 
from the impinging stimuli because 
(a) the nervous system cannot handle 
the total volume of stimulation at any 
given instant, and (db) adequate re- 
sponding to one part of the stimulus 
situation is incompatible with ade- 
quate responding to another part. At 
least three properties of stimuli are 
important in determining priority of 
selection. Physically intense stimuli 
are more apt to be selected than weak 
stimuli. Stimuli of greater biological 
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importance at the moment havehigher 
priority of selection. Finally, novel 
stimuli, i.e., those differing more from 
immediately preceding stimuli, have 
an increased likelihood of being se- 
lected. 

The decrement in accuracy of de- 
tection is accounted for by the com- 
petition of stimuli in Broadbent’s 
view. The repeated application of a 
stimulus results in reduced novelty, 
allowing other parts of the stimulat- 
ing situation to gain priority. Decre- 
ments over time have been reported 
for a variety of response measures. 
The following references are repre- 
sentative, not exhaustive: probability 
of detection (Adams, 1956; Bakan, 
1952, 1953, 1955, 1957; Deese & 
Ormond, 1953; Jenkins, 1958; Jerison, 
1958, 1959; Jerison & Wallis, 1957a, 
1957b; Kappauf & Powe, 1959; 
Mackworth, 1948, 1950, 1957), re- 
sponse (Garvey, Taylor, & 
Newlin, 1959; McCormack, 1958), 
threshold intensity (Bakan, 1955; 
Garvey, Henson, & Gulledge, 1958; 
McFarland, Holway, & Hurvich, 
1942). 

During a rest period different stim- 
uli are selected allowing the original 
task stimuli to regain novelty. This 
corresponds to the observed improve- 
ment in detection following rest. 
Mackworth (1950) and Jenkins(1958) 
found an increase in probability of 
detection with interspersed rest pe- 
riods. Adams (1956) reported recov- 
ery of decrement following a 10-min- 
ute rest. Similar recovery was found 
in response time (McCormack, 1958) 
and luminance threshold (McFar- 
land et al., 1942) following rest. 

In similar fashion, a new stimulus 


time 


introduced between applications of 
the original stimulus will temporarily 
renew the novelty of the original one 
since it is then different from the im- 
mediately preceding stimulus. Mack- 
worth’s telephone message would fit 
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this category. McFarland et al. 
(1942) observed that forced conver- 
sation with an observer after 105 
minutes of measuring luminance 
threshold produced a marked but tem- 
porary increase in sensitivity. Body 
stretching had a similar effect and 
can be interpreted as new stimulation. 
The same reasoning applies when 
knowledge of results is given following 
each critical signal; the novelty of the 
signal is maintained (Baker, 1959b; 
Mackworth, 1950; Pollack & Knaff, 
1958). We see then that in Broad- 
bent’s model, disinhibition and rein- 
forcement are examples of the same 
phenomenon. 

When several sources must be 
monitored some of them have higher 
priority initially, but as the watch 
period progresses attention shifts 
towards previously neglected sources. 
The overall level of performance is 
maintained, but becomes irregular. 
In an experiment using 20 dials as 
signal sources Broadbent (1950) found 


this to be the case. Loeb and Jean- 
theau (1958) reported no decrement 
in a 20 dials test; Howland (1958) 
found the same result using four 


meters. With a three-clock display, 
detection level remained stable (Jeri- 
son & Wallis, 1957a; Jerison & Wing, 
1957) but in comparison with a one- 
clock display Jerison and Wallis 
(1957a) found overall detection level 
was much lower. A fine-grain analy- 
sis in the same report hinted that a 
decrement may have occurred in just 
the first 3-4 minutes of the watch 
with three clocks, but this is quite a 
different order of phenomenon than 
the large decrement for a one-clock 
test unit that develops over a rela- 
tively long time period. 

The critical signal itself serves as a 
novel stimulus and partially restores 
performance. Broadbent used this to 
explain Mackworth’s (1950) finding 
that observers who detected more 
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signals maintained a higher level of 
vigilance throughout the watch. 
This interpretation was also related 
to the fact of better performance with 
higher signal rates (Deese & Ormond, 
1953; Garvey et al., 1959; Jenkins, 
1958; Kappauf & Powe, 1959). 

A continuous intense irrelevant 
stimulus will not initially have much 
effect. As the watch continues and 
the original stimulus loses novelty 
any decrement will be accentuated. 
This is more likely with a single stimu- 
lus source. Broadbent (1954) had 
observers monitor 20 dials over a 5- 
day period with noise on Day 3 and 
Day 4. Under noise conditions there 
were significantly fewer responses 
made in 9 seconds or less, than in 
quiet conditions. Using 20 lights he 
found no significant difference, but 
the lights were more noticeable 
signals. Loeb and Jeantheau (1958) 
also using 20 dials reported longer 
response latencies throughout the 
watch with noise and vibration, but 
no changes with time. In a three- 
clock test, Jerison and Wing (1957) 
introduced noise for 13 hours follow- 
ing 4 hour of quiet. A decrement 
developed in the final half-hour. 
With only one clock, Jerison and 
Wallis (1957b) found no effect of 
noise relative to quiet conditions. 

Broadbent's model consists essen- 
tially of the assumption that selection 
of stimuli is necessary in accordance 
with the three stimulus properties 
given above. The formal develop- 
ment of the model was not carried 
beyond these broad assumptions. Al- 
though Broadbent did not intend to 
cover all of the facts of vigilance, 
even the results he did consider seem 
more specific than the model can con- 
vincingly handle. Application to the 
general trends such as decrement with 
prolonged watch, recovery of per- 
formance with rest, and maintained 
efficiency with continued knowledge 
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of results appeared grossly to follow 
from the principles of stimulus selec- 
tion. In other cases where the experi- 
mental situation was subjected to 
finer analysis it was difficult to see 
how Broadbent's interpretations fol- 
lowed from the model. A number of 


his more detailed applications seemed 
like ad hoc explanations of known 
results rather than predictions de- 
duced from the original principles. 


EXPECTANCY 


The expectancy hypothesis of vigi- 
lance was originally proposed by 
Deese (1955). He began with the 
notion of an excitatory state of vigi- 
lance which determines the probabil- 
ity of detection for any observer. The 
expectancy hypothesis states that: 
(a) the observer’s expectancy or prediction 
about the search task is determined by the 
actual course of stimulus events during his 
previous experience with the task, and (b) the 
observer's level of expectancy determines his 
vigilance level and hence his probability of 
detection (p. 362). 


It should be emphasized that the 
second part of the hypothesis does 
not, for Deese (1955), imply that 
level of vigilance is directly deter- 
mined by expectancy (pp. 364-365). 
The level of vigilance for any ob- 
server is also subject to modification 
due to changes in his motivational 
states whereas his extrapolation of 
future stimulus events might not be 
affected by such changes. Deese 
wanted to avoid the artificial situa- 
tion where expectancy completely 
determined vigilance. These states, 
according to Deese, are the basis of 
individual differences in vigilance and 
it is the psychologist’s task to dis- 
cover measures of behavior which 
predict levels of vigilance expected of 
an individual in a search task. But 
do these nonexpectancy states serve 
merely to raise or lower the prob- 
ability of detection by some constant 
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amount throughout the task, or is the 
form of the detection curve over time 
changed as well? Deese does not 
clarify this matter too well, but a free 
interpretation of his exposition on 
vigilance (Deese, 1955) would suggest 
that the nonexpectancy states deter- 
mine a base level for an individual’s 
vigilance. Expectancy, however, 
determines both the overall level and 
the short range variations in prob- 
ability of detection. It is assumed 
that the average level of expectancy, 
and thus detectability, is a positive 
function of signal rate, while the 
short range variations in expectancy 
are determined by the ongoing inter- 
signal interval. Deese assumes that 
expectancy is determined by all of 
the past stimulus events in the task 
and he elaborates this notion by re- 
lating expectancy to intersignal inter- 
val and stating that it increases up to 
the value of the mean intersignal 
interval and beyond. Thus, it would 
appear that probability of detection 
would be below average when an 
intersignal interval is less than the 
mean interval, and equal to or greater 
than the average probability of de- 
tection when the intersignal interval 
is equal to or greater than the mean. 

About the only evidence that can 
be found for the expectancy hypothe- 
sis is that probability of detection isa 
positive function of signal rate (Deese 
& Ormond, 1953; Jenkins, 1958). 
However, little or no evidence can be 
found in support of Deese’s views of 
expectancy as a function of inter- 
signal interval. In analysis of some 
of his own data (Deese & Ormond, 
1953), Deese found little effect of 
interval size, although a slight tend- 
ency for higher probability of detec- 
tion for longer intervals could be 
considered small support for the 
expectancy prediction. Analysis of 
intersignal interval has not led to any 
consistent results even yet. Jerison 
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and Wallis (1957a) and McCormack 
(1958) found no effect of interval size. 
Jenkins (1958) reported detection 
dropped monotonically with increas- 
ing intervals when average rate was 
as high as 480 per hour; at lower rates 
he found no effect. Bartlett, Beinert, 
and Graham (1955) found lower prob- 
ability of detection with longer inter- 
vals using a 40 per hour signal rate. 
Mackworth’s data showed better 
detection for brief intervals than for 
his longer 10-minute intervals. Kap- 
pauf and Powe (1959) reported a U 
shaped function in an audio-visual 
checking task. One can scarcely im- 
agine a more varied set of results and 
considering average rates and ranges 
of time intervals does not resolve the 
conflict among these data. Jenkins 
(1958) suggested that the average 
rate of signals has a much greater 
effect on detection level than short 
range fluctuations, so the issue be- 
comes less critical from a practical 
standpoint. Harabedian, McGrath, 
and Buckner (1960) emphasize that 
for a basic understanding, a major 
methodological problem exists in de- 
fining an intersignal interval because 
it can be expressed in terms of (a) 
time between signals, whether the 
signal is detected or not; (6) time 
since the last detected signal; and (c) 
time since the last missed signal. 
Their results from audio and visual 
vigilance tasks revealed differences 
dependent upon the method chosen 
to define the interval. 

Baker (1958, 1959a, 1959b, 1959c) 
has elaborated Deese’s expectancy 
hypothesis and has provided a body of 
experimental evidence in support of 
his own views. A major portion of 
Baker’s arguments in applying the 
expectancy model to experimental 
variables rests on the single consider- 
ation that an operator’s expectancy 
is based on how he perceives the 
actual series of stimulus events. Any 
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variation which makes confirmation 
of expectancy more likely or which 
allows more accurate perception of 
the actual stimulus events should 
lead to better performance. For ex- 
ample, when a signal is missed and 
the observer is unaware of the omis- 
sion, the intersignal interval is in- 
creased and expectancy is lowered. 
The concern of Harabedian et al. 
(1960) with problems of defining 
intersignal interval would seem to be 
close to Baker’s interests here. 

Operationally, Baker’s expanded 
definition of expectancy has five 
major classes of variables: 

Average signal rate. Baker’s pri- 
mary interests have been in predict- 
ing short range variations in detec- 
tion as a function of intersignal 
interval, but he would seem to agree 
with Deese and Jenkins that detec- 
tion probability is a positive function 
of average signal rate (Baker, 1959c). 

Regularity and range of the inter- 
signal interval. Regularity of the 
signal increases the probability that 
the: expectancy state will be rein- 
forced. Baker contends that expect- 
ancy grows as the interval following a 
signal increases to the value of the 
mean intersignal interval and, beyond 
the mean value, expectancy falls to a 
low level. Notice that this is a modifi- 
cation of Deese’s view (1955). Baker 
(1959b) tested this hypothesis in a 
reaction time experiment similar to 
that of Mowrer (1940). He measured 
button pressing response times to an 
initial series of 20 light signals and 
then varied the interval before the 
final twenty-first signal. Using a 
2X2 factorial design the initial series 
was presented at 10-second or 2- 
minute mean intervals with regular or 
irregular intersignal intervals. Fol- 
lowing the regular 10-second series, 
changes in reaction time to the 
twenty-first signal paralleled the pre- 
dicted course. For very short inter- 
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vals reaction time was long and fol- 
lowing a decrease as the mean interval 
was approached reaction time again 
became somewhat longer up to the 
highest value tested (30 seconds), but 
not nearly as long as the reaction time 
for short intervals. The irregular and 
longer interval condition showed little 
variation in reaction time to the 
twenty-first signal although when a 
trend appeared it tended to support 
the expectancy hypothesis. This ex- 
periment suggests that in vigilance 
tasks, where the signals are always 
irregular and usually occur at low 
average rates, one should not expect 
to find a large effect of intersignal 
interval. 

The range of intersignal intervals 
was related to the occurrence of 
decrement, and apparently Baker has 
been the first to demonstrate this 
phenomenon. Using a simulated 


PPI display, Baker (1958) found no 
decrement when the intersignal inter- 


vals ranged from 36-196 seconds, but 
in a later study (Baker, 1959a), using 
the same task, found a decrement 
when the interval range was increased 
to 45-645 seconds. Interestingly, this 
latter range of intervals was that 
used by Mackworth in generating his 
well-known decrements in detection 
with several different visual and 
auditory monitoring tasks. In an- 
other experiment, using a simulated 
B-scan radar display, Baker (1959b) 
assessed the effects of complete signal 
regularity with an occurrence every 
24 minutes, a random series with a 
range of intervals from 1 to 6 minutes, 
and a wide range of intervals where 
the spread was }-10 minutes (ran- 
domly arranged). Signal frequency 
was the same for all groups, being 24 
an hour. Decrement was found only 
for the group with the widest range 
of intersignal intervals. Baker(1959c) 
would interpret this as the subject 
abandoning any efforts to form ex- 
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pectancies because it is done too 
imprecisely when intervals are long. 

Knowledge of results. This variable 
prevents a decrement by allowing an 
accurate perception of the sequence 
of stimuli. Baker (1959b) tested 
three groups of subjects given (a) no 
information; (6) complete informa- 
tion on correct, missed, and false 
signals; and (c) repetition of a missed 
signal at five-second intervals until 
detected. The task was visual detec- 
tion, the observation time 1 hour, and 
the signal rate 24 an hour. Only the 
group with no feedback had a signifi- 
cant decrement. Mackworth (1950) 
earlier had found that informing an 
observer of his success or failure in 
detecting a signal served to com- 
pletely eliminate the decrement in 
detection. Pollack and Knaff (1958) 
obtained results similar to Mack- 
worth’s. 

Knowledge of signal location on a 
visual display. Knowledge of signal 
location makes confirmation of ex- 
pectancy more likely. With increas- 
ing variability in location the ap- 
propriate part of the search area may 
not be scanned when the signal occurs. 
This leads to a lower apparent signal 
frequency and lower probability of 
detection. While Baker is concerned 
only with spatial variables as they 
influence temporal expectancy, Mack- 
worth (1950, pp. 58-59) implies a 
spatial expectancy for signal occur- 
rence at one or more locations as a 
state distinct from temporal expect- 
ancy. and Ormond (1953) 
varied the distribution of signals on 
a radar display presenting 50% in 
one quadrant. Detection in the high 
probability quadrant was only slightly 
superior to the other three during 
an hour period, but the overall prob- 
ability was very high. In a similar 
experiment, Nicely and Miller (1957) 
found greater detection in the more 
frequent quadrant, the difference 
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increasing in the last half-hour of 
the 74-minute watch. Level of de- 
tection in the high probability area 
remained relatively constant through- 
out. Bartlett et al. (1955) used the 
method of constant stimuli to meas- 
ure brightness thresholds. Knowing 
the time but not the location of a 
signal led to higher thresholds than 
knowing both the time and place of 
occurrence. A much larger decrement 
in performance was observed when 
neither time nor location was known. 
Krendel and Wodinsky (1959) meas- 
ured time to detect a randomly lo- 
cated light signal when the time of 
onset was known. They found no 
decrement in search time over an 
hour. Finally, Garvey et al. (1958) 
reported that no increase in stimulus 
intensity was necessary to detect a 
signal appearing after 60 minutes of 
monitoring provided the observer was 
warned of signal time and location 
before it appeared. Without such 


knowledge observers showed a large 
increase in visual threshold. 


Signal intensity. Baker holds that 
expectancy is more likely to be con- 
firmed with more intense signals. 
Both Adams (1956) and Mackworth 
(1950) found higher probability of 
detection for visual signals of higher 
intensity than lower intensity. And, 
if we assume that perceived signal 
intensity is related to the duration of 
the signal, Adams (1956) found a 
higher probability of detection for 
signals of 2 seconds in length than for 
signals of 1 second. 

Looking at the combined efforts of 
Deese and Baker in forwarding the 
expectancy hypothesis we again have 
a theory at the early stages of develop- 
ment making qualitative predictions 
about vigilance behavior. The under- 
lying assumptions were set forth 
more clearly than was the case with 
Broadbent’s attention hypothesis 
with the result that the expectancy 
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hypothesis lends itself more readily 
to testing. A case in point is Baker’s 
report (1959b) initiated with the in- 
tention of evaluating the model. The 
expectancy hypotheses do not grapple 
explicitly with the classical vigilance 
issue of decrement accruing over ob- 
servation time, which might be listed 
under long range effects, and distin- 
guished from short range effects where 
momentary determiners of response 
(intersignal interval, spatial location 
of the signal, etc.) are emphasized. 
These latter effects intrigue expect- 
ancy theorists. Other variables 
known to be important are largely 
neglected by Deese and Baker, e.g., 
rest periods and environmental fac- 
tors such as presence of the experi- 
menter, interpolated messages, and 
noise. Oddly enough, Deese (1955) 
devoted some space to the importance 
of varied background sensory input in 
maintaining vigilant behavior but he 
did not relate it to expectancy and 
thus might be said to have a two- 
factor theory. Baker (1959c) men- 
tioned environmental factors as pos- 
sible distractions that would lower 
apparent signal frequency, but these 
factors were not formally entered into 
his theory. 


VARIED SENSORY ENVIRONMENT 


Scott (1957) explored Hebb’s thesis 
(1955) that stimuli serve a dual func- 
tion: (a) they have a cue function in 
controlling goal responses (the func- 
tion usually ascribed them in learning 
theories), and (6) an arousal or vigi- 
lance role to which Hebb (1955) 
ascribes motivational properties. 
Scott feels that the arousal function of 
stimuli has been largely ignored and 
should be given more attention. 
Broadbent (1958) calls this the 
“activationist hypothesis’ in vigi- 
lance research, and his views (Broad- 
bent, 1953) on stimulus variety were 
somewhat similar. To document 
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implications of arousal for vigilance, 
Scott surveyed the literature con- 
cerned with performance deteriora- 
tion in a variety of repetitive tasks 
with particular attention to the uni- 
formity of sensory environment that 
accompanied such activities. He 
concluded that loss of efficiency was 
directly related to reduction in stimu- 
lus variation. When background 
stimuli are at a minimum and only 
occasional and often low key critical 
stimuli are present, rapid deteriora- 
tion should be expected. The more 
unchanging are the critical stimuli, 
the sooner deterioration will occur. 
Rest periods and introduction of 
extraneous stimuli serve to increase 
the variety of stimulation needed to 
maintain or restore efficient behavior. 

Neurophysiological as well as be- 
havioral research support the im- 
portance of a secondary role for 
stimuli. Impulses from the same 
sensory stimuli have been shown to 
reach the cerebral cortex via two 
different pathways. They travel 
directly along the sensory tract to 
the corresponding nucleus in the 
thalamus and terminate in a specific 
projection area of the cortex. A sec- 
ond pathway has been studied, 
wherein impulses from the same 
stimuli travel a slow circuitous route 
through the ascending reticular acti- 
vating system which discharges a 
diffuse bombardment over wide areas 
of the cerebral cortex. The latter 
type of cortical stimulation is consid- 
ered necessary for the maintenance 
of alert behavior. Scott (1957), Hebb 
(1955), Lindsley (1957), Malmo 
(1959), and Samuels (1959), sum- 
marize the experiments related to 
this work. 

Given the nonspecific effect of 
stimuli on behavioral organization, 
Scott suggested that stimuli lose 
their nonspecific effects with con- 
tinued exposure, the rate of such 
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habituation increasing as the environ- 
ment is more uniform. This process, 
termed ‘‘sensory habituation,” results 
in a wide range of modifications in 
behavior of which loss of vigilance is 
one of the earliest to appear. Under 
conditions of isolation over 
extended periods more serious symp- 
toms such as hallucinations appear, 
as in the McGill studies of sensory 
deprivation. 

The sensory habituation theory 
finds application to vigilance tasks in 
a number of ways. One would expect 
to find performance restored to or 
maintained at a higher level under 
conditions which increase the variety 
of either peripheral or relevant task 
stimuli. Examples cited by Scott 
(1957) included: rest periods, high 
signal rate, knowledge of results, 
interpolated messages, use of tasks 
with multiple stimulus sources, and 
presence of the experimenter. Data 
relevant to most of these factors has 
been summarized in earlier sections, 
with the work of McFarland et al. 
(1942) being particularly relevant. 
The results from vigilance tasks with 
multiple stimulus support 
Scott’s position quite well and, in 


severe 


sources 


fact, his is the only successful theory 
in this vein (Broadbent, 1950: Hoff- 
man & Mead, 1943; Howland, 1958: 


Jerison & Wallis, 1957a, 1957b; 
Jerison & Wing, 1957; Loeb & Jean- 
theau, 1958). All of these studies 
have the distinctive feature of show- 
ing no vigilance decrement whatso- 
ever—a puzzling but consistent find- 
ing that has received little systematic 
attention. Most investigators have 
chosen to use tasks where decrement 
is known to occur and largely have 
ignored the potential for understand- 
ing vigilance behavior that might be 
found in studying the tasks that fail 
to yield detection decrement. For 
example, one might entertain the 
hypothesis that the sensory inputs 
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sustaining responsiveness might arise 
from the proprioceptive stimulation 
derived from head and eye move- 
ments. It appears that under some 
experimental conditions, not yet 
clearly defined, task complexity and 
variety eliminates vigilance decre- 
inents in most cases. A negative in- 
stance is Garvey et al. (1959), where 
a decrement was found with a multi- 
dial task using a very low signal rate. 
Their task was somewhat different 
from the conventional vigilance task 
because they required the observer 
to detect a larger deviation occurring 
toa constantly moving needle in each 
dial. The very low signal rate (aver- 
age of 2.5/2 hours) may be the reason 
for the difference because Howland 
(1958) used the same general type of 
task and found no decrement. 

Scott was not proposing a theory of 
vigilance in his paper, but the rele- 
vance of his view to this area is clear. 
He provided convincing evidence for 
the presence of perceptual variation 
as a necessary condition in maintain- 
ing alertness. Although Mackworth 
(1950) and Deese (1955) have noted 
the importance of such variation, this 
point has not been formally incorpo- 
rated in any of the models dealing 
specifically with vigilance. It is 
worthy of attention. 


OBSERVING RESPONSES 


The analysis of vigilance behavior 
in terms of rate of observing responses 
is not a theory but a technique for 
studying vigilance. Theory enters 
the picture only in the assumption 
that detection of a signal serves as a 
reinforcement for the observing re- 
sponse (Holland, 1958). Holland 
(1957, 1958) has been the major pro- 
moter of this type of analysis. His 
purpose was to show the influence of 
schedules of reinforcement on rate of 
observing response and the parallel 
influence on detection performance. 
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The extent to which rate of observing 
and probability of detection follow 
the same course would determine how 
much of the detailed knowledge about 
schedules of reinforcement (e.g., 
Ferster & Skinner, 1957) could be 
carried over directly to vigilance 
behavior. The observing response 
studied by Holland was that of press- 
ing a key to illuminate a dial. The 
pointer on the dial deflected from the 
null position at intervals set by the 
schedule of reinforcement and re- 
mained deflected until the observer 
reset the pointer by pressing a second 
key. The observer could only see the 
dial by pressing the key. 

Holland studied rate of observing 
as a function of several common rein- 
forcement schedules to test the as- 
sumption that detection serves as a 


reinforcement. On fixed interval 
1 


schedules ranging from 4-4 minutes, 
observers learned temporal discrimi- 
nations reflected by “scallops” in the 


cumulative response curves during 
the last of eight 40-minute sessions. 
During extinction the rate remained 
high for a time and then gradually 
decreased to a low level. 

Following a fixed ratio schedule 
with ratios increasing from 36-200 
responses per reinforcement, Holland 
reported a higher rate of observing 
with higher ratios. Extinction curves 
were typical in showing spurts of high 
responding in a jagged decline in rate. 
These results along with successful 
training on multiple schedules and 
responding at low rates led Holland 
to conclude that signal detection 
could serve as a reinforcement for 
observing responses. His next step 
was to use schedules of signal presen- 
tation identical to those in typical 
vigilance tasks. 

Rate of observing was measured on 
variable-interval schedules with aver- 
age intervals of 15 seconds, 30 seconds, 
1, 2, and 3 minutes. These covered 
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the range in terms of signal rate from 
20-240 signals per hour. Observing 
rate was higher with higher signal 
rates. The 3-minute interval led to a 
decline in rate over time paralleling 
the decrement in percentage of signals 
detected reported by Deese and 
Ormond with a corresponding 20 
signals per hour. Among the other 
values studied, both 15-second and 
30-second intervals led to an increase 
in response rate with time while a 2- 
minute interval showed a decline. 

In another experiment using Mack- 
worth’s (1950) schedule (2, ?, 14, 2, 2, 
1, 5, 1, 1, 2, 3, and 10 minutes), the 
signal was transient, allowing meas- 
urement of both percentage detection 
and rate of observing over a 2-hour 
period. The similarities to Mack- 
worth’s results were good. Holland 
found 39% of his observers missed 
one or no signals, Mackworth found 
29%. Separating the “good’’ ob- 
servers from the ‘‘poor’’ observers, 
Holland plotted separate curves for 
the two groups relating percentage 
detected and rate of observing as a 
functior of time in half-hour periods. 
The poor observers showed a sharp 
decline over the first half-hour in both 
measures, with a continued gradual 
drop until the last period where some 
recovery occurred. For the good ob- 
servers percentage of signals detected 
did not decline and observing rate 
increased according to a negatively 
accelerated function of time. 

Holland supplements his paper 
with analogies between findings of 
vigilance studies and animal studies 
which use the Skinnerian cumulative 
response frequency method of record- 
ing defended by Holland. He cited 
Brady (1956) as finding higher re- 
sponse rate for rats under Benzedrine, 
analogous to Mackworth’s result 
(1950) of increased signal detection. 
Ferster and Skinner (1957) produced 
different rates of response using 
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multiple reinforcement schedules in 
the same session with animal subjects 
and this corresponds to Nicely and 
Miller’s (1957) result of higher prob- 
ability of detection for the quadrant 
on a radar scope having a higher 
signal rate. Holland also discusses 
increased rate of responding following 
rest for both animals and vigilance 
studies. Holland further cites the 
evidence that higher room tempera- 
tures produce lower response rates in 
animals, quite analogous to Mack- 
worth’s finding (1950) that detection 
is lowered under these circumstances. 

The method of studying vigilance 
behavior through observing responses 
is subject to criticism on _ several 
grounds. Requiring an overt response 
such as key pressing introduces an 
element into the situation that is not 
present in free scanning vigilance 
tasks. There is the implicit assump- 
tion not only that the viewer looks at 
the display every time he presses the 
key, but also that this is the same 
scanning response that would occur if 
the subjects were not required to 
press the key. An equally reasonable 
interpretation is that the subject 
presses the key rapidly in order to 
keep the display illuminated so he 
can scan when he wants to, and the 
very high rates of responding (Hol- 
land, 1957, 1958) suggest this as the 
case. Furthermore, repetitive rapid 
pressing of a key can produce work 
inhibition or fatigue. Changes in rate 
of responding under these circum- 
stances would not necessarily reflect 
the same laws of observing if head 
and eye movements were to be meas- 
ured directly and motor fatigue was 
trivial. A paper by Blair (1958) de- 
scribed an observing response that 
makes the correspondence to normal 
scanning more likely than in the case 
of key pressing, at least for the mov- 
ing head component of visual scan- 
ning. The operator was in a darkened 
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room and had a continuous light 
source on his head that had to be 
directed at the display to see the 
critical signal. A light-sensitive 
germanium diode activated a re- 
corder whenever’ the operator 
‘“looked”’ at the display, thereby giv- 
ing a complete record of frequency 
and duration of observing responses. 
Only two of his five subjects exhibited 
Holland’s finding of increased ob- 
serving as signal time approached, 
suggesting difficulties that could be 
devastating for Holland’s equivalence 
of Skinnerian observing responses 
and sense receptor orientations. 
Mackworth and Mackworth (1958) 
reported a precise method of measur- 
ing eye fixations with closed circuit 
television methods. The direct meas- 
urement of eye movements (head 
movements are excluded by the tele- 
vision method) under conditions 


where the observer is allowed to scan 
a display freely can be used to test 


the hypothesis that more remote re- 
sponses such as pressing a key are 
equivalent and yield the same laws of 
observing. Until such verification is 
made the indirect approach to the 
study of vigilance must be viewed 
with some reservation. Perhaps this 
potential source of difficulty in de- 
veloping laws for observing responses 
arose from applications of Wycoff’s 
(1952) general definition that held the 
observing response to be that be- 
havioral act which produces the dis- 
criminative stimuli correlated with 
reinforcement. Thus, orienting the 
eyes and head to receive stimuli being 
emitted from a display would be an 
example of an observing response, 
and Wycoff freely uses this example, 
but it is clear that his definition is 
broadly conceived and includes any 
response which produces the dis- 
criminative stimuli for the organism. 
Prokasy (1956) and Lutz and Perkins 
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(1960) followed Wycoff's lead and 
studied observing responses which 
were not the sense receptor orienta- 
tions so important in vigilance re- 
search. Holland (1957, 1958), how- 
ever, proceeds one step further and 
interprets the observing response of 
key pressing to illuminate the display 
in a vigilance experiment as having 
direct correspondence with sense 
receptor orientations and to yield the 
same behavioral laws. This is an ad 
hoc assumption which can be, and 
must be, proved empirically in the 
laboratory. 


CONCLUSIONS AND SUMMARY 


The main shortcoming of our con- 
temporary theories of vigilance would 
seem to be a casualness of formulation 
that makes the definitive testing of 
implications rather difficult. The 
inhibition hypothesis has an implicit 
organization provided by the classical 
conditioning paradigm, a wealth of 
relationships derived from the study 
of responses in the classical condition- 
ing situation, and the several theo- 
retical explanations of classical condi- 
tioning, but this framework for the 
vigilance problem appears to be more 
of an analogy than a scientifically 
useful theoretical system that is 
capable of rigorously accounting for 
the present facts and predicting new 
experimental findings. Even if we 
grant the classical conditioning 
schema a higher status than analogy, 
its capabilities for relating to the 
known data of vigilance experiments 
are limited. For example, Mack- 
worth (1950) sees the period of con- 
tinuous observation in a vigilance 
task as a period of experimental 
extinction. As this extinction period 
continues the expectation would be 
for a steadily decreasing probability 
of occurrence for the detection re- 
sponse. Yet, this typically is not the 
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case. Mackworth’s own data showed 
the probability of detection function 
stabilizing at an intermediate level, 
with no apparent trend toward com- 
plete extinction, and Mackworth 
acknowledged this difficulty by sug- 
gesting that other factors such as 
expectancy and __ self-instructions 
might be required to account for the 
data trends. Additional explanatory 
shortcomings of classical conditioning 
are found in the high level of respon- 
siveness induced by a high signal rate 
(the classical conditioning schema 
would predict just the opposite be- 
cause high signal rate would consti- 
tute many massed extinction trials), 
the effects of intersignal interval, and 
the general failure of detection decre- 
ment to occur in complex tasks with 
multiple stimulus sources. With all 
of these weaknesses, it is doubtful 
whether the inhibition hypothesis 
deserves serious attention in any 


efforts directed toward developing a 


satisfactory theory. It is also doubt- 
ful whether Broadbent’s attention 
hypothesis can be refined sufficiently 
to account for all of the known find- 
ings and to provide new deductions 
that can be given decisive experi- 
mental evaluation in the laboratory. 
Broadbent's loosely structured views 
generally have prevented them from 
being instruments to guide the experi- 
ments of laboratory workers in this 
area, and it is difficult to see them 
ever becoming useful unless a basic 
rephrasing of their tenets is under- 
taken to give the precision that a 
science asks of a theory. 

The expectancy hypothesis cannot 
be said to have a precise expression 
but certainly it has been a good heur- 
istic device in stimulating a number 
of experiments. Deese’s initial expres- 
sion of expectancy (1955) wasa broad 
one and was based on the empirical 
relations between the response meas- 
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ure of detection probability on the 
one hand and independent task vari- 
ables of signal rate and intersignal 
interval on the other. The principal 
research on expectancy however, has 
been by Baker (1958, 1959a, 1959b, 
1959c) who has revised the Deese 
formulation of expectancy and in his 
research has mainly concentrated on 
the short range variations in per- 
formance as a function of intersignal 
interval rather than the overall detec- 
tion level based on signal rate. While 
Deese was unable to verify his hy- 
pothesis about expectancy and the 
intersignal interval, Baker found 
some support for his elaborated ver- 
sion of expectancy which he related 
to more variables than Deese. Baker’s 
expectancy hypothesis is quasiquali- 
tative at this time but his approach 
is amenable to quantitative expres- 
sion. It is not complete, however, 
because it does not try to account for 
decrement as a function of observa- 
tion time, marked gains over rest, or 
the typical absence of decrements for 
multisource complex tasks. 

The sensory variation or activa- 
tionist hypothesis is secured in pro- 
vocative physiological hypotheses 
about the role of ascending reticular 
activating system in maintaining 
responsiveness and, by appealing toa 
proposed organismic requirement for 
stimulus variation if performance 
level is to remain high, most of the 
facts of vigilance research can be ex- 
plained in a general way. Behavior 
theories have always emphasized the 
guiding role of stimuli acquired 
through learning, and properly so, 
but these recent physiologically-based 
hypotheses stress that stimuli have a 
maintaining function for the response 
too. Thus, the monotony of the 
vigilance situation is interpreted to 
be an absence of stimulus variety 
needed to maintain the response level, 








270 JUDITH P 
and variables such as high signal rate, 
rest, knowledge of results, task com- 
plexity, etc., are taken to be opera- 
tions which promote stimulus varia- 
tion and high responding. This 
activation hypothesis is useful after- 
the-fact but it remains to be expressed 
carefully before-the-fact so that dif- 
ferential prediction can be made. 
These clues derived from the physio- 
logical level are suggestive but they 
are insufficient by themselves. The 
theorist still has problems at the 
molar level where type and amount 
of external stimulation must be re- 
lated to overt behavior. Ideally it is 
desirable to coordinate the molar con- 
cepts and functions with those at the 
physiological level, and the increased 
vigor of physiological research sug- 
gests that these echelons of organ- 
ismic action eventually will be inter- 
related. But with all of our present 
vigilance data at the molar level, it 
would be fruitful at this time to look 
for an expression of the stimulus 
variation hypothesis in terms of 
stimulus control of responses for the 
whole organism. The possibilities for 
dimensions of stimulation, both on 
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the external environmental and the 
response-induced side, seem endless. 
On the environmental side, the ex- 
perimenter might systematically vary 
the number of stimulus sources, their 
spatial array, or the type and level of 
extraneous stimulation like noise or 
music. On the response side, the 
level of stimulation could be manipu- 
lated by variables influencing pro- 
prioceptive feedback, such as extent 
of movement or physical variables of 
the control system considered to be 
related to ‘proprioception. Response- 
produced stimulation might also stem 
from mediating responses and be 
related to the number of choices in- 
volved in decision making. In the 


beginning a formulation of this kind 
could be a straightforward empirical 
expression in terms of stimulus and 
response without resort to interven- 
ing variables, but ultimately these 
variables would seem necessary for a 
sophisticated theory of vigilance be- 


havior. As Hebb (1955) has sug- 
gested, the relationship between stim- 
ulus variation and _ responsiveness 


may be a motivational one. 
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MENTAL ABILITY AND SOCIOMETRIC STATUS 
AMONG RETARDED CHILDREN!’ 
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As this review of recent research 
will indicate, there is continuing in- 
terest among students of mental re- 
tardation in the relationship between 
sociometric (or peer choice) status 
and the level of ability of the person 
being chosen. Our aim is to review 
and evaluate representative investi- 
gations of this relationship among 
normal children, institutional re- 
tarded children, and retarded chil- 
dren attending regular or special 
But our 
institutional re- 


classes in public schools. 
focus is on 
tardate. 

The educational and institutional 
importance of the relationship rests 
on the notion that the interpersonal 


the 


environment is a powerful determi- 
nant of development, and on the 
further notion that the interpersonal 
environment of the child is com- 
posed predominantly of peer group 
relations. A given interpersonal en- 
vironment may be assessed as facilita- 
tive or restrictive of development. 
Thus against school and institutional 
standards of training, achievement 
and performance, it is important to 
know the extent to which the peer 
environment rewards or penalizes 
members differentiated on abilities. 

Like groups of adults, groups of 
children exhibit structures of dif- 
ferentiated preferences. Some mem- 
bers will be preferred more by others. 


1 This review was supported in part by 
United States Public Health Service Grant 
Number OM-111, and in part by a grant from 
the University of Kansas. We acknowledge 
the helpfulness of John de Jung, Gerald Siegel, 
and Ross Copeland in commenting critically 
on an earlier draft of this report. 

* Now at Dartmouth College. 


Obviously, there are many correlates 
of such structures, and these vary for 
any group according to the criterion 
used for expressing preference. A 
preference structure will usually have 
as correlates measures of homophyly, 
homogamy, propinquity, social con- 
formity, social initiative or domi- 
nance, as well as more esoteric as- 
pects of personal attractiveness. 

It is equally likely that the prefer- 
ence structure will reflect what is cul- 
turally valued within the group, as 
Riecken and Homans (1954) suggest. 
If academic achievement is valued 
by school children for example, the 
ablest peers (academically) will tend 
to be ‘‘overchosen.”’ As other values 
may compete with achievement for 
peer endorsement, the relation be- 
tween mental ability*® and sociometric 
status will always be limited. The 
more intelligent children may also 
prove more competent in their ex- 
pression or realization of competing 
values, however, which leads one to 
expect a persisting relationship be- 
tween mental ability and sociometric 
status. This does not suggest that the 
correlation between ability and socio- 
metric status should be high, but 
rather that it should be ubiquitous. 


3 Mental ability is used throughout this 
paper interchangeably with intelligence. All 
the studies reviewed have used measures of 
intellectual performance of the individual as 
against measures of “potential.” In speaking 
of intellectual or mental ability, we hoped to 
avoid the many remedial, diagnostic, and 
other clinical connotations which accrue to 
the concept of the intelligence quotient. We 
are also aware, as our discussion indicates, of 
the need for a better conceptualization of 
mental ability. 
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NORMAL CHILDREN 


In separate studies of five different 
samples of children in grades from 
second through seventh, Bonney 
(1944) and Laughlin (1954) found 
positive correlations between in- 
telligence and sociometric status. In 
each case, the association was sig- 
nificant beyond the .01 level, but the 
coefficients (Pearson) were low, rang- 
ing from .31 (N=299) to ~°.27 
(N=525). 

Grossman and Wrighter (1948) re- 
lated intelligence as measured by the 
Stanford-Binet to sociometric status 
in a class of sixth grade students. 
They reported that high status peers 
were significantly higher in intelli- 
gence. They concluded that the two 
variables were significantly related, 
but that high intelligence did not 
assure high status. 

Barbe (1954), Bonney (1946), and 
Potashin (1946) gave further con- 
firmation to these findings, but they 
also found that mutual choices 
tended to be made between children 
with similar levels of mental ability. 
To this extent, the general associa- 
tion between ability and status is re- 
duced insofar as children in the modal 
range of mental ability reflect higher 
sociometric status: there is a greater 
chance of being chosen for subjects of 
modal ability, and the range of status 
scores is restricted. 

Gallagher and Crowder (1957), ina 
study of gifted children, found that 
four out of five students with Stan- 
ford-Binet 1Qs of 150 or above ob- 
tained above average sociometric 
status and that more than half scored 
in the top status quartile. Thus when 
relatively extreme cases are consid- 
ered, mental ability improves con- 
siderably as a predictor of sociometric 
status. 

Other 


representative or 


note- 
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worthy studies of the relation among 
normal children are summarized in 
Table 1. When measures of specific 
types of performance such as reading 
comprehension, school achievement, 
motor skills, and social maturity 
have been correlated with sociometric 
status, coefficients of similar magni- 
tude have been obtained. These find- 
ings are reviewed in Gronlund (1959). 


RESEARCH ON INSTITUTIONAL 
RETARDATES 


Using the choice of “one best 
friend’’ as a test criterion, Hays 
(1951) investigated the problem with 
a sample of 127 defective, border- 
line, and dull-normal girls housed ina 
single institutional dormitory. His 
subjects ranged in age from 7 to 23 
years, with a mean of 14. Intelligence 
quotients and mental ages were de- 
rived from Stanford-Binet tests. The 
biserial correlation between dichoto- 
mized ‘‘choices received’’ and “no 
choices received’”’ and IQ was .43 
(p<.01). 

Clampitt and Charles (1956) 
studied the relationship between 
sociometric status and supervisory 
evaluation of institutionalized men- 
tally deficient children. The 164 sub- 
jects, both girls and boys, ranged in 
age from 6 to 40 years. The mean age 
for girls was 19, for boys 14. At the 
time of the study, all subjects had 
been institutionalized for at least 1 
year, and the median term was 7 
years for girls and 3 years for boys. 
Most of the intelligence tests were 
Stanford-Binets. Sociometric choice 
and rejection responses were ob- 
tained in relation to the following ac- 
tivities: eating, playing, and working. 
Significant, positive rank order cor- 
relations were found between socio- 
metric status and MA, IQ, and 
supervisory evaluation based on 
selected traits. The correlation be- 





Authors 


Bonney (1944) 


Bonney (1946) 


Gallagher & 
Crowder (1957) 


Grossman & 
Wrighter (1948) 


Laughlin (1954) 


Potashin (1946)> 


Rosenthal 
(1956)¢ 


MENTAL ABILITY AND SOCIOMETRIC STATUS 
TABLE 1 


SUMMARY OF Major StupiEs OF NORMAL CHILDREN 


| Choice criterion (or cri- 
teria) as obtained through 
sociometric __ test. (All 
choices within sample un- 
| less states otherwise.) 


Take a trip with, vote for 


class librarian, vote for 
class officers, best friend, 
take picture with, number 
of valentines received, 
have partner for 
Easter party, expect to 
give Christmas presents, 
and best citizens and best 
leaders. No limit was made 
on the number of choices. 


as a 


Same as Bonney (1944) 


5 best friends 


} 
} 


First three choices for: 
sit near, walk home, play, 
class officer, and_ best 
friend 


One of your best friends, 
choose a group member but 
not a close friend, choose 
someone to be with once in 
a while, don’t mind this in- 
dividual in group but do 
not want to have anything 
to do with him, and wish 
individual were not in the 
group 


First three choices for: fa- 
vorite activity, classroom 
project, best friend in 
class, and best friend out 
of class 


| First three choices for: 
| play, sit next to, invite toa 
party, and go to a show 


| 
| 
j 


Measures of 
mental ability 


California Test of 
Mental Maturity 
(grade 2), Kullman- 
Anderson (grades 3 
& 4), Otis (grade 5), 
Pintner Intermediate 
(grade 6), Gates Pri- 
mary Reading (grade 
2), Stanford Achieve- 
ment 4, 
5, 6) 


grades 3, 


Same 
(1944 


Bonney 


as 


Stanford-Binet, 
WISC, Stanford 
Achievement Test 


Stanford-Binet, 
Stanford Achieve- 
ment Test 


Detroit Alpha Intel- 
ligence Test, Metro- 
politan Achievement 
Test—partial  bat- 
tery 


Dominion junior and 
intermediate group 
tests 


Kuhlmann-Anderson 
language measures 


Correlation 
coefficient 
and 
statistical 
test 


.31-.45 
Pearson 
| 


201 | .28—-.44 


Pearson 


.45 contingency 


“Usual low 
rectilinear 
relationship”’ 


.27-.31 
Pearson 


® Barbe (1954) in his study of 244 normal childern gives only percentages of intelligence levels of friends chosen 


by bright and slow learning children. 


given. 


No correlation test could be performed as distr 


ibution by IQ level was not 


> Mean difference of friends (mutual choices) and nonfriends were compared as to mental age and intelligence. 


The difference in the MA and IQ of friends and nonfriends is 1.2 and .2, respectively; 


in the MA and IQ is .32 and .06, respectively. 
© ¢ tests compared high and low sociometric groups on 10 language measures. On six of the tests the groups differed 


significantly (p <.05), and on four of the tests nonsignificantly 


’ <.10). 


the reliability of the difference 
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TABLE 2 


SUMMARY OF Major STUDIES OF RETARDED CHILDREN 








Authors 


Choice criterion (or cri- 
teria) as obtained through 
sociometric test. (All 
choices within sample un- 
less stated otherwise.) 


Measures of 
mental ability 


: 


N 


Correlation 
coefficient 

and 

| statistical 

test 





Institutional retardates 





Clampitt & 
Charles (1956) 


Dentler & 
Mackler (1960) 


Farber & 
Marden (1958) 


Hays (1951) 


McDaniel (1960) 


Sutherland, 
Butler, Gibson, 
& Graham 
(1954) 


Three choices for: eating, 
playing, and working 


Three (or more) 
for: play, work, to be, and 
not want to play with 


Three best friends 


One best friend 


One choice for: sit next to 
at lunch, sit with at movie, 
play, and work 


Two choices for: working, 
eating, recreational peri- 
ods, spare time activities, 
best friend, take with you 
on discharge, prefer to dis- 
cuss plans and troubles, 
and associate with after 
marriage 


choices | 


| Stanford-Binet 


| 


Porteus Maze 


| Stanford-Binet 
| Stanford-Binet 


WISC 


Stanford-Binet 








Noninstitutional retardates 


.34 Spearman 


.50 Pearson 


.40 Spearman 


.43 Biserial 


.35 Spearman 


.34 Pearson 





Johnson (1950)* 


Turner (1958)> 


Three choices for: like, sit 
next to, and play 


Three choices for: sit next 
to and play 





Stanford-Binet (for | 
retarded members, 
N =39), Vineland So- 
cial Maturity, New 
California Short- | 
Form Test of Mental | 
Maturity 
| 


Otis Quick Scoring 
Mental Ability Test, 
| Vineland Social Ma- 
turity 





688 





| 390 | 





(t 
(t 


a 

=4.94, p <.01) 
by 

=2.78, p<.01), 





t tests compared typical group with mentally handicapped group as to acceptance (¢ =4.10, p <.01) and rejection 


tests compared high chosen group with low chosen group on ability (¢=2.1, »<.05) and social maturity 
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tween iQ and choice status for boys, 
for example, was .34 (p<.01). 
Farber and Marden (1958) studied 
the social organization of a boy’s unit 
at a state school for the mentally de- 
ficient. The sample contained boys 
ranging in age from 11 to 19. The 
mean age was 14.7. The length of 
residence ranged from 4 months to 
nearly 15 years. Only boys for whom 
Stanford-Binet IQ scores were avail- 
able were included. For the 77 sub- 
jects, status ranks were ascertained 
by interviews and questionnaires. A 
rank correlation of .40 (p=.01) was 
found between IQ and status. 
Sutherland, Butler, Gibson, and 
Graham (1954) made a sociometric 
study of one cottage of retarded fe- 
males. The 205 subjects ranged in 
age from 18 to 53 years. The pub- 
lished report did not indicate the 
length of institutionalization for the 
cottage members. The Stanford- 
Binet (Form M) was used as the 
For their subjects, 
found a Pearson 
correlation of .34 
(p<.01) between intelligence and 
choice status. To illustrate the rela- 
tionship, Sutherland et al. compared 
high and low status subjects. The 
mean IQ of the high status group was 
62.6 compared with a mean of 42.8 for 
the lows. Of the highs, 87% were 
above 70 IQ, while in the low status 
group 81% were below 70 IQ. 
These four studies on institutional 
retarded children are corroborative. 
They compare point for point with 
studies of normal children and with 
one another in finding a positive, sig- 
nificant yet weak association between 
intelligence and sociometric status. 
McDaniel (1960) studied 15 re- 
tardates, 3 women and 12 men, who 
ranged in age from 16 to 32 years, 
with a mean age of 19. At the time of 
the administration of the first socio- 
metric test, the group had been in 


ability measure. 
Sutherland et al. 
coefficient of 
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existence for 8 months. The study 
did not indicate the length of insti- 
tutionalization, however. The mean 
IQ for the group, based on the Full 
Scale WISC test, was 52. The Spear- 
man rank correlation between IQ and 
sociometric status was .35 (p<.10). 
Although this coefficient fails to differ 
significantly from zero at the .05 
level, it is better to consider the co- 
efficients under review in terms of 
magnitude and expected range. For 
example, it is likely that McDaniel 
would have found a correlation sig- 
nificant at p equal to or less than .05 
had he assessed a group with 30 
rather than 15 subjects; yet the 
actual magnitude of the coefficient 
would probably have fallen between 
.30 and .50. 

McDaniel’s subjects exhibited very 
little interaction and a restricted 
range of Considering the 
“looseness” of this group’s socio- 
metric structure, the relation be- 
tween mental ability and sociometric 
status McDaniel found is all the more 
indicative of the weak but pervasive 
character of differentiation by mental 
ability. Even where group position 
appears to have limited salience for 
members, social preference depends 
somewhat upon the demonstration of 
skills of value in group activity. 

Dentler and  Mackler 
1961) investigated mental abilities 
in relation to sociometric status 
among 29 newly arrived boys in a 
state school for retarded children. 
The boys ranged in age from 6 to 12; 
mean IO on the Porteus Maze Test 
was 56. After the first month of resi- 
dence, sociometric and psychometric 
measures were taken. The associa- 
tion between mental ability’ and 
sociometric status was .50 (p<.01, 


choices. 


(1960, 


*A full scale score was obtained from T 
scale scores on the Porteus, the Parsons 
Language Sample, and an index of social 
maturity. 
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Pearson r). In the second month, 
repeated sociometric assessment re- 
vealed a correlation between choice 
status and ability of —.14. Further 
analysis indicated that, under in- 
creased pressure from aides to restrict 
peer interaction and to induce con- 
formity to cottage regulations, group 
structure was reorganized (at least 
temporarily) so that sociometric 
status became increasingly associated 
with conformity. 


RESEARCH ON NONINSTITUTIONAL 
RETARDATES 


Johnson (1950) conducted a study 
in two communities in which there 
were no special classes for the men- 
tally retarded, thus assuring that all 
the educable, mentally retarded were 
in regular classrooms. Grades 1 
through 5 were sampled. He found 
that children diagnosed as mentally 
handicapped obtained significantly 
lower sociometric status scores than 
In addition, 
Johnson found that  sociometric 
status was directly related to 1Q and 
that rejection scores were inversely 
related to IQ. 

Turner (1958) studied the socio- 
metric status of mentally retarded 
children enrolled in special classes in 
Negro elementary schools in North 
Carolina. In all, 18 classes were sam- 
pled and 390 children tested. Using 
three measures to assess mental abil- 
ity (Table 2), Turner found that high 
ability children were chosen 15 or 
more times and the lows 3 times or 
less, using roughly the top fourth and 
the bottom fourth of the subjects 
ranked on mental ability. 


the nonhandicapped. 


MEASURES OF SOCIOMETRIC STATUS 


The techniques and choice criteria 
used to measure sociometric status in 
the papers reviewed vary greatly. 
Barbe (1954) had teachers ask their 
pupils to nominate their three best 
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friends, and gave equal weight to 
each choice. Gallagher and Crowder 
(1957) asked for nomination of five 
best friends and ranked subjects by 
number of choices received. Bonney 
(1944, 1946) in marked contrast, 
used event-specific criteria. His sub- 
jects chose peers with whom they 
would most like to have their pic- 
tures taken, those they would pre- 
fer to work with on a committee 
for a social event, have as partners 
for a trip to a packing house, and so 
forth, across three additional cri- 
teria. Moreover, number of valen- 
tines received on Valentine’s Day 
was tabulated and included as an in- 
dicator. Criteria were varied from 
class level to level. Scores were 
weighted, and number of choices was 
not limited. Resulting frequencies of 
choices received were calculated as 
proportions of totals. 

Grossman and Wrighter (1948) 
used 10 choice criteria. They were 
thus able to assess internal reliability 
and found a mean Spearman-Brown 
reliability coefficient for four samples 
of .95. Validity was checked by 
examining the fit between  socio- 
metric status and children elected as 
class officers. Scores were weighted 
and number of choices received on 
each criterion were summed. 

In studies involving institutional 
retarded children, measures must be 
fitted to surmount illiteracy as well as 
other handicaps. Clampitt and 
Charles (1956) used three choice cri- 
teria: eating associates, playmates, 
and workmates. Choices were elicited 
in interviews, and probes were used to 
clarify communication and to insure 
at least three choices on each cri- 
terion. Number of choices was across 
the three criteria, with rejections be- 
ing weighted negatively. Farber and 
Marden (1958) interviewed their sub- 
jects and elicited unlimited nomina- 
tions of best friends. If a subject 
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named only persons not included in 
the sample, he was asked to name his 
best friends within the group. First, 
second, and third choices were 
weighted. Hays (1951) also used in- 
dividual interviews but asked only for 
choice of one best friend. Number of 
choices received ranged from zero to 
eight. Biserial correlation was neces- 
sary because of the extremely skewed 
distribution. McDaniel (1960) in- 
terviewed his institutional subjects 
but employed six criteria, including 
preferred associates to lunch with, sit 
with at the movies, play with, work 
with, help on a job, and persons nomi- 
nated as those with whom the sub- 
ject would not do any of these things. 
Only one choice was elicited on the 
first five criteria. Subjects were 
ranked by total number of choices 
received. Interestingly, no rejection 
nominations were made. McDaniel 
retested the group and obtained a 
response stability coefficient of .60 
Sutherland and assoc- 


(Spearman). 
ciates (1954) also interviewed their in- 


stitutional subjects, eliciting two 
choices on each of eight criteria. 
Most of these were identical to those 
used by McDaniel but best friends 
were nominated as well as choices of 
associates preferred after institu- 
tional release. Of all the studies of in- 
stitutional subjects, only Suther- 
land’s employed a probability model 
(Bronfenbrenner, 1945) to categorize 
subjects by status level. 

Dentler and Mackler (1960) at- 
tempted to simplify choice elicita- 
tion by using photographs of all 
group subjects randomly arrayed in 
rows and columns on a large but 
portable beaverboard. Children were 
interviewed under informal condi- 
tions and asked to point to their 
choices on four criteria: playmates, 
workmates, most want to “be,” and 
not want to play with. Number of 
choices received per criterion was 
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normalized and resulting scores were 
combined. 

In their studies of sociometric 
status of noninstitutional retarded 
children, Johnson (1950) and Turner 
(1958) asked for friendship nomina- 
tions, playmates, and _ classroom 
seatmates. Both obtained their data 
through personal interviews but only 
Johnson secured rejection choices. 
Again, choices were not weighted, and 
summed number of choices served as 
the scale. 

DISCUSSION 

This review suggests that a mora- 
torium could be declared on studies of 
the relation between intelligence and 
sociometric status among children— 
a moratorium that ought to hold 
whether the children are gifted, nor- 
mal, or mentally retarded, and 
whether or not they are institutional- 
ized. The relationship has been dem- 
onstrated to hold, and to hold at a 
characteristic level, on samples rang- 
ing in size from 15 to more than 500, 
and across a wide age range. Its 
strength tends toward a constant 
value even where group relations 
have not become well established. 
Among retarded children in institu- 
tions, length of institutional residence 
appears to have little effect on the 
general relationship. The coefficients 
hold whether the intelligence test is 
the Stanford-Binet, the WISC, the 
California test, or the Porteus Maze, 
and whether it is individually or 
group administered. Finally, the 
relation is roughly the same whether 
IQ or mental age is used. 

Most of the studies employed the 
best available instruments for assess- 
ing school related aspects of intelli- 
gence. In research involving re- 
tarded children, however, it is im- 
portant to exploit the fact that a large 
variety of abilities exist and that 
some of these are probably more 
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closely related to sociometric status 
than others. For example, Rosenthal 
(1956) found that the language of 
children of high sociometric status 
was more active and moving and 
more varied. 

It should be possible to differen- 
tiate relations between a variety of 
abilities and a variety of types of 
sociometric statuses. Thus function- 
ing abilties such as performance sub- 
scales on intelligence tests or motor 
skills might be very highly associated 
with sociometric status on criteria 
involving leisure association or play- 
mate preferences. To avoid prema- 
ture closure of the question of how 
abilities relate to group status, at this 
stage it may be much more useful to 
consider discrete performance meas- 
ures and discrete status indicators. 

These considerations regarding the 
“‘true’’ range of group related abilities 
apply also to studies of normal chil- 
dren in classroom situations. As 


Gronlund (1959) indicates, research 


on measures of ability in relation to 
sociometric status among normals 
has accumulated steadily since 1940, 
yet little has been done to develop 
measures of the kinds of abilities that 
might be assumed, on the basis of 
hypotheses about group structure, to 
have peculiar relevance as determi- 
nants of status. Most projects have 
employed instruments developed orig- 
inally by educational psychologists 
and clinicians for very different 
purposes. 

Some factors underlying the low 
correlation between mental ability 
and status are methodological; others 
are substantive. Only one of the 
studies reviewed made use of a proba- 
bility model for classifying students 
as high, medium, or low in status, 
suggesting that much of the pre- 
sumed differentiation between sub- 
jects may be due to little more than 
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chance or error variance. Similarly, 
only one study employed methods of 
computation which took into ac- 
count the ones making choices as well 
as how many choices were received. 
Heavy reciprocation in the modal 
range could distort and depress esti- 
mates of true association. Only one 
study assessed in detail the interrela- 
tions between results on the several 
choice criteria. A few more under- 
took evaluation of the reliability of 
the responses, though unfortunately, 
three of the studies treated repeated 
measurements of sociometric status 
as tests of reliability rather than as 
indicators of change. Users of socio- 
metric measures should accept the 
probability of change over time. The 
task of specifying the elements of 
change in test data that may be at- 
tributed to the actual changes in the 
variable under study, as opposed to 
change that must be attributed to 
unwanted or chance flunctuation in 
the test, is a task that has not been 
undertaken in the research under re- 
view. 

Despite the wide range of work at- 
testing the reliability and validity of 
diverse measures of sociometric 
status (Mouton, Blake, & Fruchter, 
1960a, 1960b), there is little doubt 
about the need for clarification of the 
concept of sociometric status. There 
is agreement that sociometric status 
gives an indication of the differential 
value peers tend to place on each 
other. There is also agreement that 
groups have standards against which 
such valuation is made. To this ex- 
tent, clarification by empirical means 
should be possible through closer at- 
tention to these norms or standards. 
There is no good basis for demanding 
that choice criteria should be highly 
intercorrelated or even highly stable 
over time, but there is theoretical 
basis for investigating the fit between 
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criteria and group standards. For 
example, none of the studies reviewed 
found length of residence in the in- 
stitution to be a qualifying variable, 
yet one of the concerns of the sociom- 
etrist should be the analysis of “‘in- 
stitutional effects.’’ How do groups of 
children within institutions develop 
structure, and how are these struc- 
tures influenced by the crucial fact of 
their location within an_ institu- 
tional culture? 


Length of residence may be de- | 


terminative if approached longi- 
tudinally. Group structures are 
emergents; thus, sociometric status 
within an institutional cottage which 
has just formed may differ greatly 
from status in a cottage that has en- 
dured for years. The problem is one 
not of length of residence of individ- 
uals perhaps, but of duration of the 
group, as McDaniel (1960), Farber 
and Marden (1958), and Dentler and 
Mackler (1960) suggest. 

The study by Farber and Marden 
(1958) points a path that should be 
followed. Beyond finding association 
between intelligence and status, these 
investigators demonstrated associa- 
tions between sociometric status and 
formal classification of institutional 
boys as educables, trainables, and 
working-boys; status and popularity; 
and status and history of delin- 
quency. They identified the bases on 
which institutional retarded children 
in at least one state school classify 
themselves. For example, their sub- 
jects distinguished between peers 
oriented to rehabilitation and those 
disposed toward a ‘‘custodial career.” 
Sociometric status was shown to be 
associated with such classifications. 
So used, status serves as a key to un- 
derstanding the career paths pro- 
vided by the institution and chosen 
by the patients. Though their study 
does not treat changes in status, Far- 
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ber and Marden have developed 
means for predicting the future be- 
havior of institutional retarded boys. 
Dentler and Mackler (1961), in 
studying changes in  sociometric 
status among newly arrived patients, 
found that as the culture of the insti- 
tution was absorbed, the relation be- 
tween mental ability and status came 
to depend increasingly upon stand- 
ards imposed on the group by aides. 
For at least a brief period, the usual 
positive relation between mental age 
and sociometric status was reversed. 
The mentally abler boys resisted the 
regulations most strongly and lost 
status as a result of deviance. 
Future sociometric research on the 
differentiation of members within 
children’s groups should specify with 
greater precision the nature of the 
performance or ability under assess- 
ment and the particular variety of 
status. This effort should be linked 
with collection of data relevant to the 
development, situation, and norma- 
tive content of the group. Global or 
general indicators should be aban- 


doned. 


SUMMARY 


A review of representative studies 
of the relation between ability and 
sociometric status among normal 
children, institutional mentally re- 
tarded, and noninstitutional retarded 
children, indicated high agreement 
with the generalization that indi- 
vidual ability is positively and sig- 
nificantly with choice 
status. Studies of normal children 
have demonstrated that this relation 
holds whether the abilities assessed 
are measures of mental age, intelli- 
gence quotients, or quite different 
measures of achievement, or social or 
motor skills. Although significant, 
the association is uniformly limited to 
the .25 to .50 range. 


associated 
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Sociometric status studies of in- 
stitutional retarded children were 
viewed as particularly important, as 
they provide access to the study of in- 
stitutional effects and practical eval- 
uations of social rehabilitation. Socio- 
metric research on institutional chil- 
dren has been limited to correlation 
analysis of relations between socio- 
metric status and school-type intelli- 
gence tests, length of residence, and 
age, with the exception of but a few 
reports. 


ROBERT A. DENTLER AND BERNARD MACKLER 


The reviewers proposed that future 
studies should sharpen the concept 
of mental ability or include dimen- 
sions that concepts of group structure 
suggests are of probable importance in 
a given situation. Studies that attend 
exclusively to the relation between 
intelligence and status should be 
avoided, while efforts to predict 
status within groups undergoing 
formation or change should be in- 
creased. 
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Without doubt, one of the most ac- 
tive research areas in psychology dur- 
ing the last decade has been the study 
of test taking response sets, or styles. 
In particular, attention has been de- 
voted to investigating their influence 
on personality inventory scores. This 
work has been confined almost ex- 
clusively to only three types of re- 
sponse tendency: the social desirabil- 
ity set, characterized by the con- 
sistent endorsement of desirable 


traits and the denial of undesirable 
ones; the deviation of a pattern of 
scores from the typical pattern pro- 
duced by a given population of re- 
sponders; and the acquiescence set, 
which consists of tendencies to choose 
the “true,” “agree,”’ or “‘like’’ option 


rather than their respective negative 
alternatives. Jackson and Messick 
(1958) have reviewed the research in 
this area, and have outlined their 
own suggestions for directing the 
course of future investigations. The 
purpose of the present review is to 
discuss one of the specific trends 
which has appeared in the area, sub- 
sequent to, and perhaps largely 
attributable to the Jackson and 
Messick (1958) article. 

The development of particular in- 
terest is in the utility of the response 
set component of test scores. Where- 
as Lentz (1938) and Cronbach (1946, 
1950) urged the control or elimina- 
tion of noncontent determined vari- 


! The author is indebted to Douglas Jack- 
son and Lee Sechrest for reviewing the original 
manuscript and offering their valuable sugges- 
tions for the final draft. Appreciation is also 
expressed to H. J. Wahler for initially stim- 
ulating the author’s interest in response style 
research. 
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ance, Jackson and Messick (1958) 
suggest that “for certain purposes in 
personality assessment opportunities 
for the expression of personal modes 
for responding should be enhanced 
and capitalized upon’ (p. 244). 
Thus, the recent trend referred to 
above is based on the thesis that a 
response style has its roots in the un- 
derlying personality complex of the 
responder. It is proposed that in- 
dividuals who vary in the extent to 
which they manifest a _ particular 
style of responding, will also vary in 
terms of certain measurable personal- 
ity traits. Various dimensions of 
personality have been suggested, and 
evidence collected to support this 
hypothesis. The most recent and 
most provocative article in this series 
(Couch & Keniston, 1960) concludes 
on the assertion that 

this integrated study ...has demonstrated 
both the far-reaching importance of response 
set in the area of psychological tests and the 
major proposition that the agreeing response 
tendency is based on a central personality 
syndrome (p. 173). 


The relationship between response 
styles and personality traits appears 
to be a most promising problem for 
investigations in the near future. 
However, it is clear, even at this early 
stage, that the already available 
literature offers important implica- 
tions for the design and execution of 
future studies. It is to this question 
that the present review is addressed. 


PERSONALITY CORRELATES OF THE 
SoOcIAL DESIRABILITY RESPONSE 
STYLE 

No effort will be made here to re- 
view the vast literature which has 
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accumulated on this topic since Ed- 
wards (1953) reported a correlation 
of .87 between the scaled social de- 
sirability of item content and the fre- 
quency with which it is endorsed. In 
general, research with this response 
style has continued to be directed at 
showing its influence on a variety of 
psychological inventories (Bendig, 
1959; Cowen & Tongas, 1959; Taylor, 
1959, 1961), or its appearance in 
various clinical groups (Wahler, 
1958). Recently research designed to 
study the complexity of the social 
desirability response style has re- 
vealed its multidimensional char- 
acter (Messick, 1960a) and its inter- 
action with other response styles, 
particularly acquiescence (Jackson, 
1960; Jackson & Messick, in press; 
Messick, 1960b; Messick & Jackson, 
1961). 

The present literature contains 
only two suggestions that social 
desirability responding is related to 
basic personality traits. The first 
of these is barely more than a tenta- 
tive guess. Allison and Hunt (1959) 
investigated the relationship between 
social desirability responding and the 
expression of aggression with various 
degrees of frustration. Social desira- 
bility tendency was measured by the 
Edwards Social Desirability scale 
(SD scale) and correlated with ex- 
pression-of-aggression scores from a 
paper-and-pencil situational frustra- 
tion test. In two experiments, they 
found that high SD scale scores are 
associated with a suppression of 
aggression in ambiguous situations 
where the culturally acceptable re- 
sponse is unspecified. Their tentative 
conclusion was that high social 
desirability tendencies are found in 
subjects who are ‘‘other-directed,” 
whereas low social desirability tend- 
encies characterize “inner-directed”’ 
individuals. 

Crowne and Marlowe (1960) crit- 
icized the usual approach to the 
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social desirability response style. 
They point out that one is never 
certain when to invoke the more 
parsimonious explanation that denial 
of undesirable traits is due to the 
genuine absence of the _ psychiat- 
ric symptoms usually embodied in 
the self-description inventories sup- 
posedly most affected by this tend- 
ency. Similarly, endorsement of 
desirable traits may reflect either 
defensiveness, or candid self-ap- 
praisal, especially in college students.? 
Hence, they offer a substitute method 
for measuring the social desirability 
tendency. The Marlowe-Crowne 
Social Desirability (M-C SD) scale 
is a 33-item inventory. Like the 
MMPI L scale, these items suggest 
behaviors which while socially desira- 
ble, cannot be endorsed by most peo- 
ple, if they are answering truthfully. 
It is the authors’ contention that 
social desirability responding rests on 
a basic need of the individual to be 
accepted and approved of socially. 
To test this notion, Marlowe and 
Crowne (1961) studied the relation- 
ship between SD scores and be- 
havioral tasks in the laboratory. 
They employed the Spool Packing 
task developed by Festinger and 
Carlsmith (1959). This is a boring, 
seemingly meaningless task designed 
to arouse negative, antagonistic feel- 
ings. They predicted that individ- 
uals with high SD scores have a 
strong need for social approval, and 


2 This point is made by Crowne and Mar- 
lowe with specific reference to responses made 
by college students serving as subjects in 
social desirability research projects. It is not 
necessarily a criticism of the usual clinical in- 
terpretations of social desirability scores, such 
as Wahler’s (1958) prognostic index for 
psychotherapy candidates, or the various 
MMPI scales designed to assess defensiveness. 
However, two recent studies (Jackson & 
Messick, in press; Messick & Jackson, 1961) 
have demonstrated the response set influence 
in the MMPI, and discussed important im- 
plications for the validity of these scales as 
they are currently used clinically. 
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would thus hold favorable attitudes 
toward the experimental situation 
following the Spool Packing task. 
More negative attitudes were pre- 
dicted for the low SD scorers. SD 
scale scores were dichotomized at the 
mean, and these two groups were 
shown to differ significantly (p <.01) 
on the attitudes they expressed to- 
ward the task. The difference was in 
the predicted direction, with the high 
SD group expressing more favorable 
attitudes than the low SD group. 
They also observed a correlation of 
—.54 between the M-C SD scale and 
the Barron Independence of Judg- 
ment scale (1953) which is designed 
to measure social conformity. 

To follow up this latter finding, 
Strickland (1960) administered the 
SD scale to subjects and also observed 
their behavior in an actual conformity 
situation similar to the original Asch 
(1956) procedure. When the SD 
scores were again dichotomized at 
the mean, the groups differed signif- 
icantly (p<.005) on the basis of 
their yielding scores, with yielders 
having the higher need for social ap- 
proval. 

The preceding data are not pre- 
sented here for the purpose of show- 
ing the construct validity of the M-C 
SD scale. These studies are impor- 
tant in that they represent a major 
attempt to relate social desirability 
responding to underlying personality 
variables. It is even more important 
that these studies have employed a 
procedure which is rare in the area of 
response style research. Seldom does 
one find studies wherein the stylistic 
variable is correlated with methodo- 
logically independent observations. 
The typical procedure has been to 
employ as criteria, other psycho- 
metric instruments containing a pos- 
sibly strong methodological contami- 
nation. This attempt to seek an inde- 
pendent criterion is a highly valued 
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step in psychological research, and is 
a point on which this review will 
focus later. 


PERSONALITY CORRELATES OF 
DEVIANT RESPONSE PATTERNS 


Several studies have appeared in 

recent years dealing with general as- 
pects of personality which relate to 
deviant responding in a variety of 
stimulus situations. This research 
has usually been presented as an at- 
tempt to validate the Deviation 
Hypothesis. Berg (1955, 1957, 1958, 
1959) has formulated the Deviation 
Hypothesis notion as an extension of 
the concept of “‘set’’ or Einstellung. 
He suggests (1959) that it serves as 
a unifying principle to account for 
the results of many disparate studies 
seeking to predict behavior under 
widely varying conditions. In sim- 
plest form the Deviation Hypothesis 
asserts that deviant behavior is gen- 
eral; that deviant responses occurring 
in one “uncritical’’ area of behavior 
predict the occurrence of deviant re- 
sponses in other “‘critical’’ areas. In 
a recent paper Sechrest and Jackson 
(1960) pointed out the broad gen- 
erality of Berg’s notion. 
It has been suggested that psychotics, lawyers, 
cardiac patients, transvestites, young normal 
children, character disorders, the obese, the 
feeble minded, psychoneurotics, and persons 
suffering from constipation, among others, 
represent deviant groups which might be ex- 
pected to manifest their particular propensi- 
ties toward deviation not only in a modality 
relevant to their particular symptoms and to 
items with relevant content, but also in re- 
sponse to one or more of the following: prefer- 
for abstract drawings, food aversion 
questionnaires, stimuli for conditioned re- 
sponses, autokinetic and spiral aftereffect sit- 
uations, vocabulary test items, figure draw- 
ings, musical sounds, and olfactory stimuli 
(p. 2). 


ence 


Evidence to support this position 
has been presented by Grigg and 
Thorpe (1960). They administered 
the Gough 300-item adjective check- 
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list to a college freshman class and 
computed the frequency with which 
each item was checked by students as 
being self-descriptive. Those adjec- 
tives endorsed by more than 86% and 
those checked by fewer than 14% of 
the students were selected and formed 
a final 72-item list. This adjective 
checklist was then presented to the 
next incoming freshman class, and 
their self-descriptions obtained. De- 
viation scores were computed for 
each student by counting the number 
of commonly checked adjectives 
which he omitted, plus the number 
of rarely endorsed items he checked. 
At the end of the academic year 
students in this sample who appeared 
at the counseling center for voca- 
tional guidance, or for personal coun- 
seling, or who sought private psychi- 
atric care in the community were 
identified. A control group was 


randomly selected from among the 
freshmen who did not fall in any of 


these three categories. When the 
deviation scores were compared for 
these four groups it was found that 
those seeking either private psychi- 
atric care or personal counseling for 
emotional problems had significantly 
higher scores (p <.01) than those who 
sought only vocational guidance, or 
no help at all. The two higher 
groups were not significantly differ- 
ent from one another, nor were the 
two lower groups. 

This study is the most recent in- 
vestigation of the type initiated by 
Berg and Collier (1953). They 
demonstrated that the tendency to 
give extreme responses was a deviant 
pattern which would differentiate 
high anxiety males from low anxiety 
males. They used the ambiguous pic- 
tures of the Perceptual Reaction Test 
(PRT) (Berg, Hunt, & Barnes, 1949) 
as a response set measure. Barnes 
(1955) demonstrated that psychiatric 
patients with various diagnostic labels 
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could be differentiated from one 
another, and from normal control 
subjects on the basis of their pattern 
of responses to the PRT. 

It should be evident that the 
studies reviewed here are dissimilar 
in One major respect to others in this 
area; namely, there is an absence of 
any attempt to specify a particular 
trait or personality variable which is 
related to deviant responding, and 
therefore considered basic to it. 
While Berg has clearly shown that 
different personalities differ in the 
response pattern they produce, he 
has apparently not considered it 
necessary to hypothesize an under- 
lying personality trait, and show how 
it relates to deviant responding. 
However, Berg (1959) has acknowl- 
edged several unresolved questions in 
his unfinished work. He suggests 
that the forthcoming research may be 
expected to provide important data 
concerning the environmental, per- 
sonality, and biochemical variables 
which may be related to atypical re- 
sponse patterns. 

Sechrest and Jackson (1960) con- 
sider the deviation of response pat- 
terns from group to group to be an 
extremely important research area 
with far-reaching implications for 
personality assessment. Yet, having 
learned from the “school of hard 
knocks and tough breaks’”’ that psy- 
chological processes are not always 
simple unidimensional variables, they 
voice a healthy skepticism that the 
Hypothesis is really as general as 
Berg would apparently have his 
audience believe. Fortunately, Berg 
(1959) identified, as one of the major 
sources of difficulty in his work, the 
lack of operationally clean criteria for 
the identification of deviant and 
criterion groups. He has stressed the 
importance of selecting these groups 
on the basis of va’id behavioral char- 
acteristics. It is to be expected that 








288 RICHARD 


future studies will attempt to show 
via operational criteria the extent to 
which the Deviation Hypothesis is 
applicable to measuring specific per- 
sonality traits, and the conditions 
under which its generality is limited. 


PERSONALITY CORRELATES OF 
RESPONSE ACQUIESCENCE 


‘ 


The tendency to respond “yes,” 

‘“‘agree,’’ or “‘true’’ to personality in- 
ventory items irrespective of their 
content has been the subject of many 
studies in recent years. In reviewing 
this area, Jackson and Messick (1958) 
concluded: 
In the light of accumulating evidence it seems 
likely that the major common factors in per- 
sonality inventories of the true-false or agree- 
disagree type, such as the MMPI and the 
California Psychological Inventory, are in- 
terpretable primarily in terms of style rather 
than specific item content (p. 247). 


In line with the extensive interest in 
response acquiescence per se, there 
have been numerous suggestions that 


herein lies a new device for observing 
systematic behavior which will lead 
to valid inferences about the nature 
of a particular ‘‘black box.”’ 


Authoritarians and Conformers 


The acquiescence set first attracted 
major attention in connection with 
its influence on the California F 
Scale (Adorno, Frenkel-Brunswik, 
Levinson, & Sanford, 1950). Thus it 
is quite natural that acquiescence has 
been closely linked with the trait of 
authoritarianism. When it was no 
longer reinforcing to crucify the F 
Scale because of its susceptibility to 
response style influence (Bass, 1955, 
1957; Chapman & Campbell, 1957; 
Cohn, 1953; Jackson & Messick, 
1957; Jackson, Messick, & Solley, 
1957; Messick & Jackson, 1957) at- 
tention turned to the question of a 
psychological (as well as mechani- 
cal) relationship between response 


K. MCGEE 


acquiescence and authoritarianism. 
Leavitt, Hax, and Roche (1955) sug- 
gested that the confounding of acqui- 
escence and authoritarianism in the 
F Scale was a lucky accident which 
increased the discriminating power of 
the instrument. This conclusion was 
based on their view that the tendency 
to agree with things said in an au- 
thoritative manner is itself a factor of 
the authoritarian personality. Gage 
and his associates (Gage & Chatter- 
jee, 1960; Gage, Leavitt, & Stone, 
1957) have argued that negative 
items have more validity for measur- 
ing authoritarianism than do positive 
items. Their reasoning rests on cer- 
tain assumptions about response ac- 
quiescence. They point out that dis- 
agreeing requires more self-confi- 
dence, ego strength, and personal 
security than does the act of agreeing. 
Hence, acquiescence is one of a family 
of traits including authoritarianism, 
conformity, or obeisance to author- 
ity. The person who responds in an 
acquiescent manner essentially yields 
to the “‘authority’’ of the printed 
word, or the physical stimulus how- 
ever presented. 

In a series of research projects 
Jackson (1955, 1958, 1959) has ac- 
cumulated data which tend to con- 
firm those “‘logical’’ arguments made 
by Gage cited above. In his earlier 
studies Jackson (1955, 1958) formu- 
lated a theory of cognitive energy. 
This hypothetical construct is in- 
ferred from a person's ability to resist 
field forces presented by stimuli in 
his environment. The number of 
perceptual shifts made by reversible 
figures under instructions to hold one 
phase represents an operational meas- 
ure of a subject’s resistance to hypo- 
thetical forces in the perceptual field. 
Jackson demonstrated that this meas- 
ure is positively correlated with social 
conformity. Individuals who are able 
to “hold’’ the Necker cube in the 
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position instructed also have high 
independence scores on the Inde- 
pendence of Judgment scale; subjects 
who have less resistance to the tend- 
ency to perceive changes in the posi- 
tion of the cube are also yielders, or 
conformers. Recently, Jackson (1959) 
has shown that resistance to these 
field forces is associated with acquies- 
cence response tendency as measured 
by F Scale scores. High acquiescers 
are low in cognitive energy whereas 
nonacquiescers are high in the energy 
required to resist the reversing of the 
figures. Thus, Jackson has presented 
empirical evidence to show that ac- 
quiescers are both conformers, and 
possessers of limited personality 
strength or energy. In so doing, he 
confirmed two of the predictions 


made by Gage and his associates, 
referred to above. Certainly his data 
are more convincing than those which 
led Bass (1956) to the tentative con- 
clusion that the person with a high 


social acquiescence score is ‘‘an ‘out- 
ward-oriented,’ insensitive, non-in- 
tellectual, socially uncritical indi- 
vidual; in short, a Babbitt—an un- 
questioning conformer to social de- 
mands placed upon him”’ (p. 297). 
Noncritical Thinkers 

While Jackson has shown that ac- 
quiescence relates to a general proc- 
ess of cognitive functioning, specific 
reference has been made to the rela- 
tionship between response style 
and critical, or analytical thinking. 
Frederiksen and Messick (1958) em- 
ployed Helmstadter’s (1957) method 
of separating the content component 
from the response set component of 
test scores. They observed the vari- 
ance due to response set in relation- 
ship to nine of the personality scales 
of the Personality Research Inven- 
tory (Saunders, 1955). In general 
they found low correlations, but con- 
cluded that their data suggested the 
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possibility of using response sets to 
measure some personality variables. 
Of particular interest in their report 
is the attention given to the trait of 
“criticalness,”’ defined in terms of 
tasks employed in the study. It was 
shown that a set to be critical could 
be effectively induced for some tasks, 
but also that a significant (albeit low) 
negative correlation existed between 
criticalness and acquiescence meas- 
ured by the F Scale. Hence, this 
would tend to corroborate Bass’ 
(1956) assertion regarding the un- 
critical acceptance of situations by 
acquiescers. It might also be taken as 
confirmation of Jackson’s notion re- 
lating cognitive energy level to non- 
acquiescence, assuming that critical 
or analytical thinking requires the 
exertion of a relatively high level of 
effort. 

Additional data showing the inter- 
action between acquiescence and cer- 
tain cognitive variables have been 
published by Messick and Frederik- 
sen (1959). They showed negative 
relationships between acquiescence to 
the F Scale (both original and re- 
versed forms) and verbal knowledge, 
general reasoning, and deductive 
thinking. Previously Hardy (1956) 
had shown that certain scales of the 
CPI significantly predict academic 
achievement in a midwestern college 
population. Jackson (1960) observed 
that a feature which these scales had 
in common was a large number of 
items keyed particularly 
items for which a “true’’ response 
would have been undesirable. Com- 
bining these separate observations, 
Jackson and Pacine (1960) reasoned 
that an acquiescence style, moderated 
by item desirability, should have a 
relationship to academic achieve- 
ment. They examined this hypothe- 
sis and found that a criticalness style 
did predict grade-point averages toa 
low but significant degree. Acquies- 


“‘false,”’ 
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cence scores on modified F Scale 
forms did ‘not predict academic 
achievement, but showed significant 
negative correlations with verbal 
knowledge, and consistent (not al- 
ways significant) negative relation- 
ships with general reasoning. 

There appear to be rapidly expand- 
ing pools of data which indicate that 
there are stable and meaningful rela- 
tionships between the response deter- 
minants stimulated by true-false or 
agree-disagree item forms and meas- 
ures of important cognitive variables. 


Yeasayers and Naysayers 


Perhaps the most ambitious re- 
search on the personality correlates of 
the acquiescence style has been pre- 
sented by Couch and Keniston 
(1960). They combined 681 items 
from several personality inventories. 
A factor analysis of responses to these 
items yielded a 360-item agreement 
factor which the authors labeled the 
Overall Agreement Scale (OAS). 


With additional measures, they found 
positive correlations between OAS 
and scales with a high proportion of 


responses keyed true. (Where the 
greater proportion of items was keyed 
false, the correlations with OAS were 
negative.) Traitwise, high OAS was 
associated with measures of impul- 
sivity, dependency, anxiety, mania, 
anal resentment, and anal preoccupa- 
tion; low OAS was associated with 
ego strength, stability, responsibility, 
tolerance, and impulse control. 

In addition to correlating OAS 
with these paper-and-pencil measures 
the authors made a searching clinical 
evaluation of their extreme respond- 
ers. Subjects were selected from each 
tail of the distribution of OAS scores; 
high scorers were identified as yea- 
sayers, and low scorers were labeled 
naysayers. Each subject then filled 
out a 55-item incomplete sentence 
form and participated in a depth in- 
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terview lasting from 2 to 4 hours dur- 
ing which time the experimenter 
focused on each of these 55 projective 
responses. Following the interview, 
the experimenter rated each response 
on five separate scales, indicating the 
extent to which the response was 
typical of the theoretical yeasayer, or 
the theoretical naysayer. Interviews 
were “‘blind’’ with respect to knowl- 
edge of the subject’s OAS score, and 
subjects were randomly divided 
among interviewers. Results revealed 
clear differences between the ratings 
made for yeasayers and naysayers. 
The authors describe these differences 
using the typical abstract clinical 
language: yeasayers are impulsive, 
emotionally reactive, extraverted, ex- 
ternally oriented, low in psychologi- 
cal inertia, and possess passive egos; 
naysayers are guarded, defensive, 
constricted, inhibited, introverted, 
withdrawing, introspective, high in 
psychological inertia, slow and critical 
reactors, and possess active egos. In 
summarizing their report, the au- 
thors consider the dimersion of Stimu- 
lus Acceptance versus Stimulus Rejec- 
tion to be the best single construct 
subsuming all the other specific traits 
related to agreeing response style. 
The similarity of this position to that 
of Bass (1956), Frederiksen and 
Messick (1958), and Jackson (1959) 
is quite significant. 

Webster (1960) utilized the Couch 
and Keniston (1960) stimulus ac- 
ceptance-rejection concept with his 
own speculation that response set 
(RS) variance is related to an inhibi- 
tion versus lack of inhibition dimen- 
sion. He concluded that another “‘all- 
pervasive syndrome”’ is being iso- 
lated for the understanding of per- 
sonality. This conclusion was based 
on Webster’s data which showed that 
RS had high negative correlations 
with measures of Schizoid Function- 
ing and Impulse Expression. His RS 
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scores are determined by the fre- 
quency with which the subjects re- 
spond ‘‘no”’ so as to deny undesirable 
traits implying psychopathology. In 
line with the Crowne and Marlowe 
(1960) argument cited earlier, one 
would naturally predict these find- 
ings. But Webster carries his inter- 
pretation further: 

Finally ...it becomes clearer why RS is a 
measure of inhibition; both these correlating 
scales measure lack of inhibition or control. 
In particular, Schizoid Functioning measures 
a kind of ego-diffusion which is very typical of 
the undercontrolled college student (p. 5). 

In summary, there appears to be a 
general agreement in the literature 
that there is a trait of response ac- 
quiescence, and that it is probably 
closely related to some personality 
variable. There is also high agree- 


ment as to what to call this variable, 
or what kind of dimension to put it 
on: acquiescers are stimulus accept- 
ing, uninhibited, conformers; non- 
acquiescers are stimulus rejecting, 


inhibited, independents. 


THE PRESENT STATE OF AFFAIRS 


Throughout their provocative dis- 
cussion of acquiescence and person- 
ality variables, Couch and Keniston 
moved progressively further up the 
abstraction ladder. Beginning with 
individual item responses they moved 
via factor analysis, projective testing, 
and depth interviewing to the level of 
ego functioning and “psychological 
inertia.”’ Their progression was not 
unique, for it paralleled the route 
taken by Bass from the endorsement 
of proverbs via correlational proce- 
dures to “Babbittism!’’ This com- 
ment is not intended to take issue 
with the language used by former in- 
vestigators, nor to debunk the design 
of their research. Such language, 
whileabstract, neverthelesscommuni- 
cates ideas and feelings to a large 
audience of psychologists, particu- 
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larly clinicians. Likewise such re- 
search is an integral part of the in- 
ductive process of theory building, 
which is a valuable means to an end. 

However, left in this present state, 
the task is unfinished. To assume 
that the personality correlates of re- 
sponse acquiescence have been iden- 
tified is to make the present collection 
of inductive research findings and end 
in itself. The task remaining should 
be obvious, i.e., the deductive formu- 
lation and testing of hypotheses to 
predict the behavior which the theory 
indicates should be related to the 
stylistic variables. 

This review of the literature has 
been organized around the author’s 
impression that what has gone on in 
the past has resulted primarily in de- 
scriptive information. The question 
proposed, and answered with progres- 
sively more rigor and precision, ap- 
pears to have been: ‘‘What is the ac- 
quiescent person like?’”” There may 
be those who would argue that this is 
an inappropriate question to ask. It 
is only a minor variation of the 
“What is... ?’ type of question 
which Muenzinger (1957) labeled as 
a ‘sterile exercise’ in psychological 
research. How this argument is to be 
resolved will be left for the reader to 
decide for himself. The point to be 
made here is based on the assumption 
that the previously gathered descrip- 
tive data do represent a valuable con- 
tribution to the area of personality 
assessment. But, it is now time to 
shift into low gear and change course 
so as to proceed down the abstraction 
ladder in the direction of observable 
behavior. A question of major im- 
portance for future research is one 
stated in predictive form: ‘What will 
the acquiescent person do?”’ 

At this point a short definition of 
terms is necessary to establish the 
proper E:nstellung for the point to be 
made in the following section. The 
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problem of definition revolves around 
the usage of the term ‘‘observable be- 
havior.”’ It.is essential to distinguish 
behavior in situations especially de- 
vised to observe a subject's responses 
in the laboratory, or in field settings, 
from patterns of responses made to 
paper-and-pencil questionnaires or 
inventories. There is no argument, 
certainly, that the term observable 
behavior, broadly conceived, includes 
both activities. Perhaps a useful dis- 
tinction to some would be in terms of 
“psychometric versus nonpsycho- 
metric’ situations. Yet, again, 
broadly defined, any measurement of 
behavior is a psychometric situation. 
There is a meaningful distinction be- 
tween the behayior involved in re- 
porting the distance traveled by a 
stationary light in an autokinetic 
conformity task, and the behavior 
involved in marking the agree cate- 
gory on an IBM answer sheet. Cer- 
tainly there are few who would not 
grant this distinction, or who would 
not further grant that the distinction 
is based on the methodological inde- 
pendence of the two situations. In 
the following paragraphs the term 
observable behavior is used to denote 
responses elicited in a laboratory task 
as distinct from those elicited by the 
standard psychometric instrument. 


SUGGESTED CONSIDERATIONS FOR 
FUTURE RESEARCH 

Measures of Personality Variables 

The first point to be made in this 
regard has already been alluded to. 
There is remarkable dearth of studies 
in this area which have attempted to 
study the relationship between re- 
sponse style measures and observable 
behavior measures of personality var- 
iables. The investigations of Crowne 
and Marlowe and their students area 
notable exception. In the area of ac- 
quiescence only Jackson has used an 
independent behavioral measure of 
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the trait he was studying along with 
response style. The reasons for the 
needed shift to behavioral tasks are 
clear. Of primary importance is the 
fact, well recognized by most, that 
paper-and-pencil personality scales 
are heavily loaded with RS influence. 
To the extent that this is true, it 
naturally leads to inflated correla- 
tions between an RS measure, A, and 
a personality scale, B. If item con- 
tent is considered of minor impor- 
tance in determining a scale score, as 
RS research generally assumes, then 
correlating A with B to show the re- 
lationship of two response style meas- 
ures is a reasonable procedure. But 
to correlate A with B and conclude a 
relationship between response style 
and the trait purportedly measured 
by the content of B is a logically un- 
justifiable procedure. The paper by 
Webster (1960) which was discussed 
above is a good example of this re- 
markably inadequate approach to 
hypothesis testing. Personality in- 
ventories and RS measures are, by 
nature of their similar construction, 
to say nothing of item overlap, highly 
contaminated methodologically. Be- 
cause of the generalized operation of 
RS variables, one cannot serve as an 
independent criterion for the other. 
When Frederiksen and Messick 
(1958) corrected their personality 
scale scores for RS , they found quite 
low relationships. Independently ob- 
served behavior is the only meaning- 
ful criterion measure of the personal- 
ity variables. 

Finally, inasmuch as the goal of 
psychological research is generally ac- 
cepted to be the prediction of be- 
havior, it is inadequate to stop short 
of that point. Admittedly laboratory 
conditions which provide the neces- 
sary controls of relevant variables 
also produce an artificiality which 
makes the situation unlike the sub- 
ject’s real world. Yet, it is a step in 
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the direction of the ultimate criterion, 
and one which justifies the added ex- 
pense and effort. 


Measures of RS Variables 


In the area of acquiescent respond- 
ing, several instruments and tech- 
niques have been proposed. They in- 
clude various kinds of content from 
aphorisms to statements of personal 
and social attitudes. Some instru- 
ments attempt to measure pure ac- 
quiescence by putting an individual 
in a situation where he is forced to 
respond to ambiguous, or essential- 
ly meaningless stimuli. Cronbach 
(1950) suggested that response tend- 
encies should be most apparent in 
situations where stimulus conditions 
are most uncertain. Berg and Rapa- 
port (1954) confirmed this expecta- 
tion by showing that consistent pref- 
erences for certain response options 
result when the subjects respond to 
an ‘‘unstructured questionnaire” 
wherein they guess about the non- 
existent items the experimenter is 
supposedly reading to them. Bass 
(1956) also used this unstructured 
technique and compared it with his 
content-laden Social Acquiescence 
Scale. He found a correlation of .00. 

If acquiescent behavior is to be in- 
terpreted as the indiscriminant use of 
yes, true, and agree options, irrespec- 
tive of item content, correlations 
should be high between content and 
noncontent measures of acquiescence. 
Or, if there are two types of acquies- 
cence as Bass has suggested, one 
would at least expect that various 
noncontent measures would consti- 
tute one of the types, and correlate 
highly with one another. McGee 
(1962) has found correlations which 
suggest that this is not true either. 

While this discussion has focused 
only on the measurement of acquies- 
cence, the point applies equally well 
to any response style one wishes to 
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relate to central personality traits. 
Consequently, the course of future 
research on the personality correlates 
of response styles is clearly indicated. 
The basic assumption is still in need 
of verification. But first, two things 
must be demonstrated: that tech- 
niques are actually available for 
measuring a “‘pure’’ response style 
tendency, and that this response style 
variable can be used to predict be- 
havior in an independent situation on 
the basis of some theoretical inter- 
pretation of that variable. Until 
these two points are demonstrated 
there is no defensible evidence that 
test taking response style is related to 
underlying personality syndromes, or 
traits. 


SUMMARY 


The recent surge of interest in re- 
sponse style components of person- 
ality tests scores has led to a more 
specific interest in measures of re- 
sponse variables as predictors of un- 
derlying personality traits of the re- 
sponders. The research studies rele- 
vant to this question, most of which 
appeared in the literature since 1958, 
have been reviewed. The point was 
made that these investigations have 
provided de- 


meaningful abstract 


scriptions of the personalities of indi- 
viduals with certain response style 
tendencies, but little real defensible 
data to tie response styles to the 


criterion of independently measured 
behavior. Suggestions were made for 
designing future research efforts such 
that the data will lead to a prediction 
that the acquiescent individual will 
do something in a particular situation 
as well as merely 

times than he says 


say more 
on the F 
Scale. Only with such data is it felt 
that an adequate criterion will exist 
for claiming a relationship between 
response tendencies and basic person- 
ality traits. 


“ves” 
‘*no”’ 
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A NOTE ON THE INCONSISTENCY INHERENT 
IN THE NECESSITY TO PERFORM 
MULTIPLE COMPARISONS 


WARNER WILSON 
University of Hawatt 


Some studies involve only two 
groups and provide only one differ- 
ence to be tested for significance. 
Other studies involve several groups 
and provide many differences to be 
tested for significance. A question 
has arisen in the literature (Duncan, 
1955; Ryan, 1959; Tukey, 1949) as to 
how significance should be deter- 
mined when a number of tests are to 
be made in the same experiment. 
Ryan (1959) has performed a valua- 
ble function by pointing out that 
there are several ways of dealing with 
this problem. 

It would be possible to adopt a strat- 
egy that would hold errors constant 
per comparison, per hypothesis, per 
experiment, per group, or even per 
subject. The question is essentially: 
what is the appropriate unit in which 
to evaluate research? It is the thesis 
of this paper that the most defensible 
decision is to divide our work into 
separate tests of hypotheses and to 
hold constant the expected number of 
errors per hypothesis tested. 

The number of groups involved in 
the test of a single hypothesis may 
vary depending on the attitude of the 
experimenter and the nature of the 
hypothesis. Often an experiment 
determines the effects of several 
degrees of a measurable variable: in 
this case the hypothesis is usually 
that there is some relationship be- 
tween an independent and dependent 
variable. In this case differences 
between individual groups may be of 
little concern. For example, if 
length of food deprivation is varied at 
2-hour intervals from 2 to 24 hours, 
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it is the overall variability between 
groups that is of interest, not the 
difference between any particular 
pair of groups. A failure to find a 
difference between the 8-hour and 10- 
hour group would be of little impor- 
tance. In other cases several groups 
may be run that do not represent 
points on a measurable dimension 
and in such cases the difference be- 
tween each group and every other 
group may be viewed as a separate 
hypothesis. For example, if the re- 
sults of five different therapies are 
compared, the significance or non- 
significance of the difference between 
any two groups would probably be 
considered important. In this second 
case there would be more hypoth- 
eses but less data relevant to each 
one. If several variables are studied 
in a single experiment the significance 
of the effect of each variable and each 
interaction may be tested as a 
separate hypothesis. The practice of 
holding errors constant per hypoth- 
esis tested seems to be by far the 
most common in the literature: the F 
test is typically employed when the 
performance of several groups is sub- 
sumed under one hypothesis, and the 
t test is typically used to test differ- 
ences between pairs of groups when 
each pair is construed as bearing ona 
separate hypothesis. Many, if not 
most, researchers are not even aware 
of the various special statistics that 
have been devised for the purpose of 
using some unit other than the 
hypothesis as the basis for error rate. 

It is necessary to recognize, how- 
ever, that all discussions in the 





INCONSISTENCY IN MULTIPLE COMPARISONS 


literature recommend some unit other 
than the hypothesis as the basis for 
determining error rates. Ryan 
(1959) and Tukey (1953 unpub- 
lished), for example, favor the ex- 
periment as the preferred unit. The 
only dissenter to this general ap- 
proach seems to be Duncan (1955) 
who favors what is essentially a com- 
promise position. The purpose of this 
paper is to consider the pros and 
cons of the per-experiment versus the 
per-hypothesis approach. An at- 
tempt is made to make clear that 
some inconsistency is involved in 
either case.and that a consequence 
of this fact is that several of the 
arguments offered in favor of the per- 
experiment strategy are in fact offset 
by parallel, equally logical argu- 
ments, in favor of the per-hypothesis 
strategy. It is pointed out below that 
while it is impossible to prefer one 
approach to the other on logical 
grounds, other considerations ac- 
tually favor the per-hypothesis ap- 
proach. Ryan (1959) and Tukey 
(1953 unpublished) actually speak of 
a per-comparison (rather than a per- 
hypothesis) approach as the possible 
alternative to the per-experiment 
approach. Although the two may 
seem to be similar, the per-hypothesis 
approach is different from the per- 
comparison in that any number of 
comparisons may be considered in 
testing one hypothesis, however, the 
arguments presented in relation to 
the per-comparison strategy apply in 
exactly the same way to the per- 
hypothesis approach. 

As Ryan makes clear, if a_per- 
hypothesis strategy is used, the same 
number of errors will be expected in 
100 small experiments, each of which 
tests one hypothesis, as will be ex- 
pected in a large experiment that 
tests 100 hypotheses (Ryan, 1959, 
pp. 30-34). Ryan maintains that in- 
dependence of the tests or lack of it 
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makes no difference: ‘‘The error rates 
per comparison and per experiment 
are completely unaffected by inde- 
pendence or lack of it’’ (Ryan, 1959, 
p. 34). Obviously if the error rate per 
hypothesis is held constant, the error 
rate per experiment will vary, de- 
pending on the size of the experi- 
ment. On the other hand, if the error 
rate per experiment is held constant, 
the error rate per comparison will 
vary, depending again on the size of 
the experiment. Since inconsistency 
is involved in either case a choice on 
purely logical grounds does not seem 
possible. If the implications of this 
fact are followed consistently, several 
of the arguments in favor of the per- 
experiment solution become meaning- 
less. 

Ryan (1959) and Tukey (1953 un- 
published) both argue that a per- 
hypothesis strategy implicitly gives a 
person license to make relatively 
more errors per experiment merely 
because he has been industrious in 
running many groups. Although this 
argument seems quite irrelevant to 
the issue, it is only fair to note the 
other side of the question. The per- 
experiment strategy implicitly gives 
a person license to make relatively 
more errors per hypothesis, merely 
because he has been lazy, as evi- 


denced by the running of few groups! 


It is hard to see how the first argu- 
ment can be considered more 
pelling than the second. 

Ryan (1959) also argues that a per- 
hypothesis strategy, by favoring the 
person who is industrious, as evi- 
denced by the running of many 
groups, may lead people who run 
many subjects in a two-group ex- 
periment to demand the privilege of 
using a higher error rate because 
they too have been industrious, as 
evidenced by the running of many 
subjects. While this argument seems 
a little too artificial to deserve con- 


com- 
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sideration, it is once again easy to 
point out the parallel counter-argu- 
ment. The per-experiment solution, 
by favoring the person who is lazy, as 
evidenced by the running of few 
groups, might lead those who run few 
subjects in a two-group experiment to 
demand the privilege of using a 
higher error rate because they too 
have been lazy! Once again it is hard 
to argue that the possible conse- 
quences of the per-hypothesis ap- 
proach are worse than the possible 
consequences of the per-experiment 
approach, 

Some of the comments in the litera- 
ture (e.g., Ryan, 1959, pp. 35-37) 
may suggest that the use of a per- 
hypothesis strategy necessarily re- 
sults in an inordinate amount of error 
or at least in more errors than a per- 
experiment strategy. Such a con- 
clusion would be completely false. It 
is true that if a per-hypothesis error 
rate is employed there will be rela- 
tively more errors per experiment in 
large experiments, but it is also true 
that if a per-experiment error rate is 
employed there will be relatively more 
errors per hypothesis in small ex- 
periments. The total expected num- 
ber of errors can be controlled 
equally well no matter in what unit 
results of research are measured. In- 
sistence on fewer errors per-experi- 
ment would decrease total errors to 
be sure, but insistence on fewer errors 
per-hypothesis would decrease total 
errors equally well. Ryan actually 
concedes this point at one place, but 
apparently fails to recognize its im- 
plications (Ryan, 1959, pp. 37-38). 
Unless one wishes to argue that an 
error does more damage merely be- 
cause it occurs in a large experiment, 
it must be concluded once more that 
there is no logical basis on which to 
choose between the different strate- 
gies. 

The writer firmly agrees with those 
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who think a more rigorous control of 
errors is called for; however, he sug- 
gests that the most effective way for 
workers to achieve this is to hold the 
expected error rate constant at .001 
per hypothesis. Suppose one person 
publishes at the .05 level per experi- 
ment and a second publishes at the 
.001 level per hypothesis. Assuming 
that the second person's experiments 
test less than 50 hypotheses on the 
average, he will make fewer errors 
both per experiment and per hy- 
pothesis than will the first person. 
Clearly an experimenter can be as 
rigorous as he wishes and still use the 
hypothesis as his research unit. 
Another type of consideration re- 
lates to the effect that each strategy 
might have on the behavior of re- 
searchers as they design, carry out, 
and write up experiments. It has 
been argued (Ryan, 1959, p. 36) that 
a per-hypothesis type approach en- 


courages investigators to include 
“irrelevant” variables in their studies 
merely to increase their chances of 
obtaining one or more “‘significant”’ 


findings to publish. Surely such 
motivation is deplorable. However, 
it is doubtful that many researchers 
will deliberately resort to such 
tactics, and surely editors will be re- 
luctant to accept implausible false 
positives no matter what statistical 
techniques are used. Furthermore 
the line between adding irrelevant 
variables and exploring new possi- 
bilities is rather subtle and it is not 
at all certain that psychology would 
not profit from some additional blind 
seeking for relationships. It is nec- 
essary to insist on looking at both 
sides of the picture. What sort of 
pressures does the per-experiment 
procedure apply to the researcher? 
It seems likely that, for better or 
worse, most experimenters design 
studies to demonstrate relationships 
they believe to exist. Their desire is 
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to obtain data that will support their 
hypotheses and compel others to 
accept them. Very generally it can 
be assumed that there is often a 
choice between testing a number of 
hypotheses in different experiments 
by running only the two groups ex- 
pected to be most extreme versus 
testing several hypotheses in one ex- 
periment by running several groups 
to determine the effects of each vari- 
able. 

The latter, more extensive type of 
study, is greatly to be preferred since 
it consumes less journal space per 
hypothesis, it allows for the evalua- 
tion of interaction effects, and it 
gives some idea of the shape of rela- 
tionships. The per-experiment ap- 
proach seem to discourage extensive 
studies because the more extensive 
the study the less the likelihood of 
being able to accept any given hypoth- 
esis as correct. In other words if a 
per-experiment strategy is used, the 


smaller the pieces in which one can 
publish, the greater his chances of 


having significant findings to re- 
port. When a per-hypothesis strategy 
is followed this additional encourage- 
ment to publish in small pieces is not 
present. The literature is currently 
cluttered with small one-shot studies 
and there is a relative dearth of well 
conceived, intensive investigations. 
Certainly all angles should be con- 
sidered before a strategy is advocated 
that might intensify this unfortunate 
tendency. Apparently either the per- 
experiment or the _ per-hypothesis 
strategy might have ill effects on cer- 
tain researchers, but once again it is 
hard to see the arguments in favor of 
the per-experiment approach as more 
compelling than those favoring the 
per-hypothesis approach. 

In addition it can be pointed out 
that there are strong advantages to 
the per-hypothesis solution. The 
basic question is, what is the most 
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meaningful unit in which to evaluate 
research? Traditional practice ap- 
parently has chosen the hypothesis 
as the unit and this paper maintains 
that this is the correct choice. It 
seems that the hypothesis is psycho- 
logically the more logical unit. This 
writer, at least, would prefer to be 
confronted with a great array of find- 
ings, all of which (statistically speak- 
ing) have a comparable probability of 
being correct, rather than to be con- 
fronted with a number of conclusions 
each of which can be accepted with 
more or less confidence depending on 
the size of the experiment from 
which they were derived. 

Another major advantage of the 
per-hypothesis approach is the fact 
that it requires no additional learn- 
ing on the part of researchers. Ob- 
viously the more complicated statis- 
tics become the more time it will take 
to learn to use them and the less time 
will be available for research itself. It 
seems foolish for researchers to ac- 
cept additional statistical complica- 
tions unless there are telling reasons 
for doing so. It might also be added 
that it is practically impossible for a 
statistically naive researcher to 
abandon the traditional per-hypoth- 
esis techniques because _ statisti- 
cians have not yet agreed upon any 
other strategy or even on how best to 
achieve the various alternatives that 
have been advocated. Duncan 
(1955) mentions nine different solu- 
tions to the problem of multiple com- 
parisons and comments that, ‘‘Un- 
fortunately, these tests vary con- 
siderably and it is difficult for the 
user to decide which one to choose 
for any given problem” (p. 2). One 
purpose of Duncan’s article was to 
propose still another solution: It has 
not received general acceptance 
(Ryan, 1959) and it seems apparent 
that statisticians have no generally 
agreed upon alternative to suggest as 
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a possible replacement for the per- 
hypothesis approach. 

It must be concluded that the 
arguments in favor of the per-hypoth- 


esis strategy are more numerous 


and more compelling than those in 
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favor of the per-experiment solution. 
Therefore the less effortful per- 
hypothesis approach should be con- 
tinued indefinitely unless valid argu- 
ments are presented in favor of a 
different strategy. 
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I am very glad that Wilson 
(1962) has addressed himself to the 
basic issues involved in multiple com- 
parisons—issues partly of logic and 
partly of research strategy. There is 
a very real dilemma involved, and 
one which needs to be brought into 
the open even if we cannot reach a 
single solution. In my discussion of 
the problem (Ryan, 1959b), I be- 
lieved that the balance of the argu- 
ments favored the error rates based 
upon the experiment as the unit, and 
I stated that conclusion in its strong- 
est form. Perhaps the statement was 
one-sided, as Wilson believes, but I 
hoped that this would bring the issue 
more clearly to the readers than a 
less positive statement. Wilson has 
chosen the other horn of the dilemma 
and has done a service in stating the 
case for his choice so clearly. Many 
of my colleagues had previously ex- 
pressed unhappiness with my conclu- 
sions, not on logical grounds, but 
because experiment-based error rates 
made life more difficult for the re- 
searcher, who must find bigger ¢’s for 
significance. 

Even a casual examination of the 
current journals will show that this is 
an issue which crops up ina very high 
percentage of research reports, 
though usually unrecognized by the 
authors of the reports. Wilson's 
(1962) arguments (apart from his 
plea for more stringent significance 
levels) tend to support the status 
quo, but it would be a pity if his posi- 
tion were adopted simply out of 
inertia without careful weighing of 
the advantages and disadvantages. 
A thorough examination of the issues 


might even lead us to abandon sig- 
nificance testing in favor of some 
more useful form of statistical treat- 
ment, although I do not yet see what 
might replace significance tests or 
confidence limits. 

The issues must be met and settled 
by psychologists themselves, or by 
other ‘“‘consumers’’ of statistical 
method. The statisticians can tell us 
how to accomplish what we want to 
do, but we must decide what we want 
to do in terms of overall research 
strategy. If we could quantify the 
costs of doing research with various 
experimental designs, the ‘“‘earnings”’ 
due to correct conclusions and the 
“‘losses’’ due to Type I and Type II 
errors, the whole problem could be 
solved mathematically. Since, clearly, 
this can be done only in such limited 
and artificial situations that it could 
not provide a general procedure, we 
are forced to choose our procedures 
on the basis of broad and qualitative 
arguments.! This is what we do when 
we choose a significance level for a 
single are 
balancing qualitatively the risks of 
Type I error against the risks of 
Type Il error. The same kind of 
qualitative necessary 


comparison, since we 


balancing is 
when we consider the issue of error 


rates per hypothesis versus error 
rates per experiment. Unfortunately, 
however, the balancing of risks be- 
comes much more complex than it is 


' Tukey (1960) has recently made a dis- 
tinction between ‘ and “conclu- 
sions,” the latter being of more relevance to 
scientific work. He also argues that the theory 
of statistical decision is not appropriate to the 
testing of conclusions. 


‘ 


‘decisions” 
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for the simple, isolated comparison. 

I have asked for the opportunity to 
comment on Wilson’s (1962) paper, 
not because I want, or expect, to 
prove that he is wrong, but because 
some of the implications of his argu- 
ment need to be pursued further. His 
conclusion may be the best one, but I 
am not yet convinced. 

Let us first stipulate (@) an im- 
portant point of agreement and (6) 
one issue which can profitably be left 
for a separate discussion: 

I would agree that there are many 
cases where overall analysis of vari- 
ance is more appropriate than multi- 
ple comparisons of individual means. 
These are cases which are essentially 
problems of regression, which I in- 
tentionally left out of my earlier 
analysis of the problem of multiple 
comparisons (see Ryan, 1959a, p. 
396). Even here the issue of error 
rates may rear its head if we wish to 
test separately for linear, quadratic, 
and higher order components, or if 
we wish to state where the maximum 
or minimum falls. This latter prob- 
lem becomes closely similar to the 
problem of multiple comparisons, so 
I shall not argue it separately. 

I shall leave aside the question of 
whether the various F tests in a com- 
plex analysis of variance should be 
treated with an error rate per F test 
(per hypothesis), or on the basis of 
an overall rate for the whole experi- 
ment. Here I only wish to make clear 
that the point of view I previously 
expressed on this problem (Ryan, 
1959b, p. 44) is not to be attributed 
to Tukey. Tukey prefers to follow 
current standard practice in complex 
analysis of variance, allocating an 
error rate to each F separately. If 
one of the variates is subjected to 
multiple comparisons, he would allow 
this family of comparisons the same 
error rate familywise that would 
otherwise have been allocated to the 
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single F test for this variate. When I 
stated that the same arguments which 
lead to a familywise control of error 
could be used to support an overall 
control of error for the whole multi- 
variate experiment, this was my own 
conclusion, which Tukey did not ac- 
cept.2, For the present let us leave 
the multivariate problem out of the 
discussion as too complex to deal 
with until we have settled the more 
basic question of how to deal with the 
univariate experiment. 

Turning now to what appears to 
be the principal point of difference, 
Wilson objects to the argument that 
the control of errors per hypothesis 
gives the experimenter more chance 
of finding some significant differences 
merely by being more diligent and 
studying more different conditions. I 
will admit that this is a question of 
values (but not of morals). There 
are several questions of values in- 
volved throughout significance test- 
ing; e.g., choosing to work at the .01 
level instead of the .05 level is also 
a question of value, or how important 
we consider erroneous conclusions to 
be. Wilson is justified in questioning 
my point because the problem was 
incompletely analyzed in my earlier 
discussion. At the time I was merely 
trying to express the rather vague 
notion that obtaining significant re- 
sults should not depend solely upon 
the persistence or diligence of the 
experimenter. These are admirable 
qualities, but a statement of signifi- 
cance should also bear some relation 
to the facts of nature, as well as to the 
diligence of the experimenter in seek- 
ing out these facts. 

It must be emphasized that error 
rates refer to what happens when the 
null hypothesis is true. If the experi- 
menter is so perspicacious or so lucky 


2J. W. Tukey, personal communication, 
1956. 
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as to find real effects upon behavior, 
the Type I error rates no longer apply 
and considerations of power enter the 
picture. If,on the other hand, he is so 
unfortunate as to waste his energy 
upon a real null situation, there is no 
reason to allow him to make some 
mistakes as a consolation prize. 

I will agree with Wilson, however, 
that this whole aspect of the relation 
of error rates to experimental in- 
vestment needs more exact analysis. 
Unfortunately there are so many 
facets to be considered at once that 
we have to oversimplify to make any 
sense of the problem. One approach 
to the problem is to try to hold the 
factor of experimental effort or cost 
constant while comparing different 
experimental designs. One approxi- 
mation would be to consider the total 
number of observations as a measure 
of the work done in the experiment. 
This might be a reasonable assump- 
tion on the average, since we are 
concerned with the choice of designs 
to be used with the same kind of 
measures. Suppose for example, that 
Experimenter A studies two condi- 
tions with 100 observations per condi- 
tion. Experimenter B also makes 200 
observations altogether but spreads 
them over 10 different conditions of 
the variable, and compares each 
mean with each of the others. A 
tests 1 null hypothesis, while B tests 
45 hypotheses with the same total 
amount of data. If the error rate is 
controlled per hypothesis, B can be 
expected to make .45 errors while A 
is making .01 errors, but their error 
rates will be equalized if the experi- 
ment is used as the base for comput- 
ing error rate. Thus it seems that the 
rate of error per experiment should be 
controlled if we wish to equalize the 
risks of Type I error for a given 
amount of experimental data spread 
over varying numbers of groups. 

Now consider Experimenter C who 


303 


also studies 10 conditions, but col- 
lects a greater amount of data so that 
the additional groups represent addi- 
tional experimental effort. He is, as 
Wilson points out, not allowed any 
more error on the per experiment 
basis. But this is also true if error rate 
is computed per hypothesis. Specifi- 
cally, suppose that B makes 200 
observations spread over 10 condi- 
tions, while C makes 1,000 observa- 
tions on the same 10 conditions. 
Both methods of computing error 
rate would treat the two experi- 
menters alike. One would allow both 
to make .45 errors the other would 
allow both to make .01 errors. 

C is not allowed any more Type I 
errors than B for his extra effort by 
any of the methods of computing rate 
of error, but C does gain in power 
from the extra observations. This is 
consistent with current practice, in 
that the error rate for A’s single 
comparison would not be changed if 
he collected more data for the two 
conditions. In short, controlling 
errors per experiment holds the 
amount of error constant for a fixed 
amount of experimental effort 
whether it is devoted to a single pair 
of conditions or many different condi- 
tions. Controlling the rate of error 
per hypothesis allows the error rate 
to increase as the number of groups 
increases, even if the same total 
amount of experimental effort is 
spread thinly over many groups. For 
both methods, additional observa- 
tions, without a change in the re- 
search design, are used to increase the 
power of the experiment but do not 
change the rate of Type I error. 

The above argument points up the 
fact that there is an arbitrary deci- 
sion involved in current practice even 
with single comparisons. It has been 
decided that power shall vary with 
number of observations, but that 
rate of Type I error shall not. This 
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presumably is the historical result of 
the concept of significance develop- 
ing before the concept of power. 
Actually, of course, the concept of 
power derives from decision theory in 
which the rates of Type I and Type 
II error would both be variable and 
adjusted in terms of the costs of the 
two types of error. Instead, however, 
the concept of power has been tacked 
on as an adjunct to the significance 
test, but not controlled directly be- 
cause of our ignorance of the con- 
sequences of error. This is one of the 
aspects of present practice in signifi- 
cance testing which needs more 
thorough examination. For example, 
do we really want extra effort in re- 
search to be devoted to the detection 
of smaller and smaller differences? 
Meanwhile, if we accept current 
practice as the appropriate approach 
to single comparisons, the comparable 
solution to multiple comparisons 


would be to control the error rate per 


experiment. 

There is another argument in favor 
of basing error rate upon the single 
hypothesis, an argument which Wil- 
son does not mention directly but is 
related to his fear of discouraging 
large-scale experiments if we adopt 
the error rate per experiment. This is 
the point that even the experimenter 
who tests a single hypothesis in an 
experiment is one of many who may 
be working upon the problem over 
the years, or he may be doing one 
experiment in a series of his own. 
Yet (the argument continues) it is 
accepted practice for him to test this 
single hypothesis as though it were 
isolated from all of the studies carried 
out by him or others in the past. If 
he is allowed to do this, why should 
the experimenter who tests several 
hypotheses in the same paper be 
penalized by requiring him to limit 
his errors for the total experiment 
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rather than for each 
separately? 

This is a very powerful argument if 
we accept current practice as appro- 
priate, and if we agree that current 
practice is as described. I would ques- 
tion both of these assumptions, how- 
ever. An experimenter never con- 
siders his results in isolation from the 
rest of the available data in the field. 
One result which is out of line with 
other findings in the field is likely to 
be regarded with suspicion even if it 
is technically significant, and further 
replication will probably be called for. 
Even though this is not a quantified 
or explicit procedure it is in the same 
spirit as controlling errors for the 
total experiment in multiple compari- 
sons. We are handicapped in our 
knowledge of the total experimental 
background because of the failure to 
publish many negative findings and 
the consequent bias in the results 
available to us (see Sterling, 1959). 
Nevertheless we do, and should, try 
to take account of total mass of in- 
formation available to us in inter- 
preting any specific experimental 
result. 

Wilson’s (1962) fears that experi- 
menters will be discouraged from do- 
ing large experiments with many 
different conditions if we expect them 
to limit the total errors for the whole 
experiment. Yet why should they be 
discouraged? They are doing these 
experiments because they want to 
find real effects, not because they 
want to report Type I errors. Wilson 
seems to believe that I advocated 
error rates per experiment because | 
wanted to increase the stringency of 
our standards of significance. Ad- 
vocating the experiment as_ the 
proper base for computing error rates 
does not imply that we should set up 
more stringent criteria of significance 
than are now customary; this is a 


hypothesis 
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separate and independent question. 
It is true that a .01 level of sig- 
nificance experimentwise or per ex- 
periment means a much lower proba- 
bility level for individual comparisons 
within the experiment if many com- 
parisons are made. We do not have 
to work at the .01 level experiment- 
wise, however. The problem is not 
what particular probability should be 
chosen, but which method of com- 
puting the rate is comparable from 
one research to another. I have 
merely argued that the rates per ex- 
periment or experimentwise, whether 
.01 or .90, provide greater compara- 
bility from one research to another. 
It happens that I, like Wilson, do 
believe that we should use more 
stringent criteria of significance if 
we use significance tests at all. But 
he is quite correct in pointing out 
that greater stringency could be 
achieved by lowering the probability 
levels for errors per hypothesis as 


well as controlling errors per experi- 


ment. Consequently, the choice of 
error rates is logically irrelevant to 
the issue of stringency. My own rea- 
sons for supporting greater strin- 
gency are based upon the belief that 
Type I errors are more dangerous in 
the present state of development of 
psychology than are Type II errors. 
In other words, I believe that it is 
less important if we miss some very 
small effect of a variable, than it is to 
claim that the variable has an effect 
(of unspecified magnitude) which 
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does not actually exist at all. This is 
however, another problem of many 
facets which cannot be threshed out 
here.® 

To summarize, Wilson (1962) has 
presented some very strong argu- 
ments for controlling error rates per 
hypothesis instead of for the whole 
experiment. It is a service to have 
this side of the issue presented so 
clearly, and it is possible that he is 
right. There are, however, strong 
counterarguments which I have tried 
to present, and which still weigh 
heavily enough to convince me that 
we should control the error rates per 
experiment. To me, the strongest 
argument is that controlling the rate 
of error per hypothesis permits wide 
variation in the total amount of error 
expected for different experimental 
designs which involve the same total 
number of observations. 

The issue is by no means settled, 
however. There are many factors 
which must be weighed against each 
other, and there are probably some 
considerations that have not yet been 
dealt with. An adequate solution of 
the problem might even lead to an 
abandonment of significance testing 
in favor of some other method of deal- 
ing with the effects of sampling error 
which would not create the dilemma 
with which we are now faced. 


3 One especially important problem is what 
we shall do with negative results (see Sterling, 
1959; Tullock, 1959). 
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Many psychological studies yield 
nominal data—numbers of subjects, 
objects, or responses—distributed in- 
to two or more mutually-exclusive 
categories. With data of this type the 
experimenter, and the reader, usually 
want to know if the observed fre- 
quencies differ significantly from 
what one would expect on the basis 
of chance. Chi square can be used for 
making such a test provided that the 
numbers of observed frequencies ex- 
ceed certain minimum requirements. 
The binomial test provides an exact 
test of significance for samples of any 
size but it can only be applied to data 
distributed into two categories. This 
article shows how the multinomial 
distribution can be modified and used 
as an exact test of significance for sam- 
ples of any size and for data distrib- 
uted into any number of categories. 
Although the test described here fol- 
lows closely those given by Smith and 
Duncan (1945, pp. 308-326) and 
Tate and Clelland (1957, pp. 35-36), 
it differs from both of them in certain 
important respects. 


DESCRIPTION OF THE METHOD 


The probability that a sample of 
data will yield the frequencies m, 


No, Ms3-*-+M, distributed into k 
categories is given by the multinomial 
distribution: 


1 This paper was prepared while the author 
was on leave (1959-60) with the Office of 
Naval Research in London, England. 
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where the p’s are the proportions with 
which the characters 1,2,3,---k 
occur in the population. 

When we use the multinomial 
distribution as a test of significance 
we assume that the null hypothesis 
holds, that is, that the p’s are equal 
and that each of them is equal to 
1/k, where k is the number of cate- 
gories. If, in addition, we let N=m, 
+no2t+ns+ --- +m then Equation 
1 becomes: 


-n, ! 


One further addition is still neces- 
sary. If the null hypothesis holds 
then we should make no distinction 
between different permutations of 
any outcome. To take a particular 
example this means that all of the 
following six outcomes are equiv- 
alent: 

4 A’s, 3 B’s,1C 


4 A’s, 1 B,3C’s 
3 A’s, 4 B’s, 1 C 
34's, 1 BL, 4C's 
- er oe a 
1 A, 4 B’s, 3 C’s 


Adding this refinement 
tion 2 gives us: 


to Equa- 
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where: 


P=the probability of obtaining 
any permutation of m, m2, 
nm; - > m frequencies 

N=the total number of observa- 
tions (individuals, objects, or 
responses) 

k =the total number of categories 
into which the N observations 
are distributed 

i=any integer <k 

t;=number of ties of size 1 among 
the k frequencies 

j=any one category 

n;=the number of observations in 
the jth category 


Note that Equation 3 without the 
(i)* term tells us the number of ways 
in which any particular outcome 
(such as 4,3,1) can occur. Note also 
that for k=2, Equation 3 reduces to 
the form of the binomial distribution 
which would be used for a test of this 
kind. 

Equation 3 merely gives us the 
probability that N observations will, 
by chance, be distributed into k& 
categories with any particular set of 
frequencies 1, M2, m3 ++ - m. To use 
Equation 3 asa test of significance we 
need to add to this the probabilities 
of all those outcomes which are even 
more deviant than the one observed. 


ILLUSTRATION OF THE METHOD 


To illustrate the application of this 
formula I shall use some data col- 


lected by Deininger (1960). In one 
part of his experiment Deininger had 
subjects use keysets in which the keys 
had maximum displacements of yz, 
4, and x inch. At the conclusion of 
an unspecified number of trials, 12 
subjects voted for the keyset they 
liked least: eight disliked the 35- 
inch, one the }-inch, and three the 
ys-inch. The author concludes that 
“the smallest displacement appears 
controversial, the largest unpopular 
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and the middle the most desirable.” 
The question we want to answer is: 
If the subjects really had no par- 
ticular dislikes and were simply vot- 
ing randomly how likely is it that we 
could have obtained an outcome as 
deviant as 8,3,1 by chance? 

To apply Equation 3 note that 
N=12, 2\=8, ne=3, ns=1, k=3, 
t2=0, and t;=0. Inserting these 
identities into Formula 3 gives us: 


3111213! 12! er 
~ [(0+1)2!][(0+1)3!] 81311\3 


1 12 
= (11,880) () 
3 


The simplest way to illustrate the 
full computation of the probability 
we want is to list all possible out- 
comes and the number of ways in 
which each outcome can occur (since 
the term (4)'? appears as a constant 
we can disregard it for the time be- 
ing). Table 1 gives these data. 

Note that we have a check on these 
computations since the total for 
Table 1 is equal to 3'*. 

Before continuing we need to con- 
sider what we mean by outcomes 
“even more deviant than” §8,3,1. 
What we mean are all those out- 
comes which have an even smaller 
probability of occurring. Table 2 
lists these in order. The total for 
Table 2 is 37,431 and this value multi- 
plied by (4)!*, or divided by 531,441, 
gives us a probability of 0.070. 

To summarize, if the null hypoth- 
esis is correct we could expect an 
outcome as deviant as 8,3,1 to occur 
about 7 times in 100. According to the 
usual conventions we would therefore 
conclude that this outcome is not 
statistically significant. 


THE CASE OF TIES 


The example given above is con- 
venient because it is small enough for 
us to see all the essential computa- 
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tions compactly. It does not, how- 
ever, illustrate one nuance of Equa- 
tion 3, namely, what happens when 
some categories have tied observa- 
tions. In another part of his experi- 
ment Deininger (1960) had 15 sub- 
jects use five different keysets—call 
them A,.B, C, D, and E—which 
differed in several ways. At the con- 
clusion of an unspecified number of 
trials each subject voted for the key- 


51112131415! 
2 = 


51112131415! 


15! (=) 
212131415! 2NWI4I6!2!N 5S 


S\N 15! /1\" 
(=). =(=) = 0.037. 
21) 21416121 \ 5 


TABLE 1 


ALL PossIBLE OuTCOMES WHEN 12 
INDIVIDUALS ARE DISTRIBUTED INTO 
3 CATEGORIES AND THE NUMBER 
OF WAYS EACH OUTCOME 
Can OccuR 





Number of ways the 


Outcome : 
outcome can occur 





| 


12, 
iu, 
10, 
10 


’ 


3,960 
2,970 
11,880 
8,910 
4,752 
23,760 
47 ,520 
2,772 
33 , 264 
83,160 
55 ,440 
49 896 
166 ,320 
34,650 


531,441 
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set he liked best. The results were 2 
votes for A, 1 for B, 4 for C, 6 for D, 
and 2 for E. Now we want to test the 
significance of the outcome 2,1,4,6,2. 
The novel feature of these data is 
the two 2s. 

In this case N=15, 2,=2, n-=1, 
ng= 4, n,=6, #e=2, R=5, te=1 (for 
n,and ns), t3=0,t4=0, andt;=0. In- 
serting these values into Equation 3 
gives us: 


1: 


The p of 0.037 is, of course, merely 
the probability of getting exactly an 
outcome of 6,4,2,2,1 or some per- 
mutation of this outcome. The 
probability of an outcome as deviant 
as 6,4,2,2,1 (computed in a manner 
analogous to that shown in Table 2) 


TABLE 2 


OUTCOMES IN ORDER OF INCREASING 
LIKELIHOOD UP TO AND INCLUDING 
THE OuTcoME 8,3,1 


| Number of ways the 
outcome can occur 


Outcome 





3 

72 
396 
396 
1,320 
2,772 
2,970 
3,960 
4,752 
8,910 
11,880 


wWNan FAW we 
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37 ,431 
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is 0.49. It has, in short, no statistical 
significance whatsoever. 


A COMPARISON OF THE EXACT 
MULTINOMIAL TEST WITH 
Cut SQUARE 


As noted above, when k=2, the 
exact multinomial test given by 
Equation 3 reduces to the formula 
which would be used for an exact 
binomial test. It is of interest, how- 
ever, to compare the outcomes of the 
exact test given in this article with 
those of the chi square approximation 
commonly used for this purpose. 
Table 3 shows all possible outcomes 
when 12 individuals are distributed 
into three categories (from Table 1) 
and the exact probabilities of obtain- 
ing outcomes as least as deviant as 


TABLE 3 


Exact AND CHI SQUARE PROBABILITIES 
oF Every PossinLE OUTCOME WHEN 12 
INDIVIDUALS ARE DISTRIBUTED INTO 

3 CATEGORIES 





Chi square 
probability 
|- 


Exact 
proba- 
bility | 


Outcome | Corrected 


| for 

| continuity 
00001 

00010 | 

00091 | 

.00117 

00525 
0498 
.0183 
00866 
0388 
.0498 
.0388 
106 
174 
000 


Uncor- 
rected 
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those listed. For comparison, the 
third column of Table 3 shows the 
corresponding probabilities? com- 
puted by the chi square formula. 
One reason why chi square probabil- 
ities often do not agree with those 
calculated by exact tests is that chi 
square uses a continuous distribu- 
tion to approximate discrete ones. 
To compensate for the errors in- 
volved in this approximation statis- 
ticians often recommend applying a 
correction for continuity. Although 
corrections for continuity tend to 
overcompensate a little they do 
usually bring chi square probabilities 
into closer agreement with their true 
values. In the fourth column of Table 
3 the chi square probabilities have 
been corrected for continuity accord- 
ing to the method recommended by 
Cochran (1952). 

Table 3 shows some striking dis- 
crepancies between the chi square 
probabilities, both uncorrected and 
corrected, and those resulting from 
the exact test. Note especially the 
number of discrepancies in the critical 
areas around the 1 and 5% points. 
The outcome 6,6,0, for example, is 
significant at the 1% level by the ex- 
act test, scarcely significant at the 
5% level by the uncorrected chi 
square test, and not significant at 
the 5% level by the corrected chi 
square test. Similar large discrep- 
ancies occur for the outcome 9,2,1; 
8,2,2; and 8,3,1. 

Smith and Duncan = (1945) 
assumed, as the chi square test does, 
that the probability of any occur- 
rence is proportional to the evenness 
of the distribution of the N observa- 
tions in the k categories. For this 
reason they stated that zones of re- 


2 These values were obtained, by linear 
interpolation when necessary, from the 
Pearson-Hartley (1956) tables. 
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jection, and so the statistical sig- 
nificance of any outcome, would be 
proportional to D?, where, in my 


notation, 
2 
yf 


For the very small example they gave 
(N=5, k=3) this happened to be 
true, but it is certainly not true in 
general. The outcomes 6,6,0 and 
8,2,2 (7,5,0 and 8,3,1; 6,5,1 and 
7,3,2; and 5,5,2 and 6,3,3) have 
identical values of D? but markedly 
different probabilities of occurrence 
by the multinomial distribution. The 
symmetry implicit in D? is, in fact, 
one of the reasons why the chi 
square test yields probabilities which 
differ so markedly from the true ones 
(Table 3). 

However, the real source of the 
discrepancies between the two kinds 
of tests lies even deeper than this. 
Although the formula for chi square 
is derived from that of the multi- 
nomial distribution (for example, 
Kendall, 1947), at three separate 
points the derivation makes use of 
approximations which are valid only 
for large Ns. Table 3 shows that the 
cumulative effect of the errors in 
these approximations may be con- 
siderable when chi square is applied 
to data with small n’s. 
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A DISADVANTAGE OF THE 
Exact TEST 


Perhaps the chief disadvantage of 
the exact test described here is that 
it is laborious to calculate and very 
quickly becomes prohibitively diff- 
cult to apply when N or k become 
large. The first example given in this 
paper is relatively straightforward 
and not too tedious. The second ex- 
ample (with 15 individuals dis- 
tributed into five categories), how- 
ever, required the computation of 84 
separate outcomes with a total of 
30,517,578,125 ways in which the out- 
comes could occur. This problem is 
almost a little too big for a desk 
calculator and a little too small for a 
digital computer. 


SUMMARY 


The extact multinomial test de- 
scribed in this article can be used to 
test the significance of variations in 
the numbers of observations distrib- 
uted into two or more mutually- 
exclusive categories. When there are 
only two categories the test reduces 
to the binomial test. The test is valid 
for samples of any size but it quickly 
becomes prohibitively difficult to 
apply as the total number of observa- 
tions or the number of categories in- 
creases. A comparison with the chi 
square test shows how seriously the 
latter may be in error when the num- 
ber of observations is small. 
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THE ANALYSIS OF PROFILE DATA 
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Vanderbilt University 


During the last 20 years, the crys- 
tal-ball-gazing test interpreter has 
gradually been supplanted by the 
profile-gazing tester. He gazes stead- 
fastly at the ups and downs on the 
profile chart for the Kuder Prefer- 
ence Record, the MMPI, and the 
Wechsler subtests; and from these he 
gives vocational advice, classifies the 
mentally ill, and searches for brain 
damage. Also, profile analysis has 
invaded psychological _ research, 
where comparisons are made between 
self and ideal-self ratings, and meas- 
ures are made of interpersonal per- 
ception. Being scientific folk, some 
psychologists reasoned that if profile 
analysis is used (later it will be argued 
that sometimes it is better not to), 
then it should be used ‘‘objectively,”’ 
i.e., in a mathematical and statistical 
framework. 

There are three kinds of questions 
that profile analysis needs to answer: 

1. How do you measure the rela- 
tive similarity of two profiles? 

2. How do you discriminate the 
typical profiles of two or more 
groups, e.g., MMPI profiles of dif- 
ferent diagnostic groups? 

3. How do you “cluster” profiles 
into homogeneous groups? A sur- 
prising amount of controversy has 
raged over how to answer these ques- 
tions (see Haggard, Chapman, Isaacs, 
& Dickman, 1959, for some _ pro- 
posed solutions; see Sawrey, Keller, 
& Conger, 1960, for other proposed 
solutions, and a review of the rele- 
vant literature)—surprising because 
there are relatively simple, and sat- 
isfactory (we hope) answers to all 
three. The proposed answers to each 
of these will be discussed in turn. 


SIMILARITY OF PROFILES 


There are two principal criteria by 
which to judge any measure of rela- 
tionship: it should consider all of the 
information relevant to the com- 
parisons and it should have mathe- 
matical properties which permit 
powerful methods of analysis. The 
first is partly a matter of prefer- 
ence; but once the desired measure is 
formulated, it greatly influences the 
kinds of analyses that can be per- 
formed. 

Cronbach and Gleser (1953) re- 
viewed the many proposed measures 
of profile similarity, criticized most 
of them, and recommended the use 
of the d measure, which is the square 


root of the sum of squared differ- 
ences between profile elements. In an 
earlier paper, Osgood and Suci (1952) 
had also proposed the use of d. The 
argument for the use of d is that it 
considers all of the possible informa- 
tion in the profiles: level, shape, and 


dispersion. With respect to the first 
criterion given above for choosing a 
measure, d is appealing; and no one 
has proposed a more appealing 
measure. 

The d measure also stands up well 
with respect to the second criterion. 
By using a measure of interpoint 
distance in Euclidean space, powerful 
methods of analysis are indeed avail- 
able, ones which will be discussed 
more fully in the following sections. 
Because of these reasons, the author 
recommends, as others have, that 
profiles be considered as points in 
Euclidean space; however, as will be 
shown later, it is actually better to 
use a function of d rather than d itself 
in the analysis of profile data. 
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The use of d is appealing if, and 
only if, it is intended to compare pro- 
files simultaneously with respect to 
level, shape, and dispersion. Later 
in the article it will be argued that in 
some studies it would be more mean- 
ingful to equate all profiles for level, 
and in other studies to equate for 
both level and dispersion, in which 
cases it would be more appropriate 
to use covariances, and correlations, 
respectively, rather than d. It will be 
shown that the same, powerful 
methods can be used with “raw” 
profiles as can be used with covari- 
ances and correlations. 


DISCRIMINATION OF GROUPS 


If one accepts the Euclidean 
model, a powerful method of analysis 
is available for discriminating the 
typical profiles of two or more groups, 
namely, the linear multiple-discrimi- 
nant function (Tatsuoka & Tiede- 
man, 1954). This will provide the 
best (in a least-squares sense) linear 
combination(s) for discriminating 
the groups, and it offers a procedure 
for assigning new individuals to one 
of the groups. For example, the dis- 
criminant function could be applied 
to the problem of differentiating the 
MMPI profiles of paranoids, psycho- 
paths, and schizophrenics; and the 
results could be used to classify new 
cases into one of the three groups. 


CLUSTERING ‘“‘RAW”’ PROFILES 


Clustering raw profiles is the prob- 
lem that has aroused so much dis- 
cussion, and the major purpose of this 
article is to attempt a satisfactory 
solution. This solution probably 
would have been adopted long ago 
had it not been for one mistaken 
notion among some _ psychologists 
about multivariate analysis. 

Let us set the problem in focus by 
imagining that we are studying 
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of 


S 


oF 6 


Fic. 1. Interpoint distances for six persons. 
MMPI profiles and that we have the 
profiles from a broad sample of 
psychotic patients. We want to 
study the interrelations among the 
raw profiles in such a way as to say 
how many “kinds’’ (clusters) of 
profiles there are, and we want to 
measure the extent to which each 
patient belongs to each cluster. 
First, we will assume that relation- 
ships among the profiles should be 
pictured as interpoint distances in 
Euclidean space. (Some arguments 
for so doing were given above.) 

In Figure 1 are pictured the 
hypothetical points for six patients, 
which are shown as lying in a two- 
space in order to simplify the illustra- 
tion. By arbitrarily designating the 
distance from Person a to Person b 
as 1, all of the interpoint distances 
are set, and these are presented in 
Table 1. 

In looking at Figure 1 and Table 1 
it is obvious that there are two clust- 
ters, defined, respectively, by pa- 
tients a, b, and c, and by patients d, 
e, and f. If in actual research there 
were so few cases involved and such 
definite clusters were present, no re- 
fined method of analysis would be 
needed; but this is almost never the 
case. A method of analysis will be 
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demonstrated which can _ recover 
these clusters and can be used equally 
well with any number of cases and 
regardless of the relative ‘‘visibility”’ 
of clusters. 

It is apparently not widely known 
that d matrices such as that in 
Table 1 can be factored. The method 
was derived by G. J. Suci (Osgood, 
Suci, & Tannenbaum, 1957). Suci 
and I cooperatively explored his 
method of factoring d and found it to 
be a special case of raw score factor 
analysis. This is where the major 
misconception arises: some psychol- 
ogists are evidently unaware that 
raw score cross-products can be fac- 
tored in the same way as correlation 
coefficients are factored. 

The failure to realize that factor 

analysis is not restricted to correla- 
tion coefficients is either directly 
evident or implied in many of the 
papers relating to methods of clus- 
tering profiles. Here is an example 
(Sawrey et al., 1960): 
Surely all factor analytic studies have not 
been interested in shape alone, yet this is, in 
fact, all that correlations, and consequently 
{italics added] factor analysis, takes into 
account (p. 670). 


An Example of Raw Score Factor 


Analysis 


Because of the unfamiliarity of 
factoring raw score cross-products, a 


TABLE 1 


MATRIX OF d's FoR Points SHOWN 
IN FIGuRE 1 





Person 
Person | 
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worked-out example will be given. 
The first step is to obtain the sum of 
raw cross-products for each pair of 
patients over the profile elements. 
For the MMPI this consists of ac- 
cumulatively multiplying the scores 
on corresponding scales for each pair 
of patients. A hypothetical matrix of 
such cross-products corresponding to 
the d matrix in Table 1 is shown in 
Table 2. Because I have chosen an 
artifical example, the cross-product 
terms look different from what would 
be obtained from an actual study of 
MMPI profiles. 

How should one analyze Table 2 in 
order to obtain clusters? The answer 
is to factor analyze, and any of the 
methods commonly used with cor- 
relations can be applied: square root, 
multiple group, centroid, principal 
components, or what not. In doing 
this, the customary formulas are 
applied in the customary ways. Let 
us see what a centroid analysis pro- 
vides. 

For the first factor, sum the ele- 
ments in each column, find the 
square root of the sum of the column 
sums, and divide this into each of the 
column sums. These are loadings on 
the first centroid factor in the raw 
score space. Use the first factor load- 
ings to obtain a first set of residuals, 
reflect, extract a second set of cen- 
troid loadings, and continue in this 
manner until residuals are “‘small’’ or 
until enough factors have been ob- 
tained to satisfy the experimenter’s 
curiosity. 

By choosing a set of points in a 
two-space, only two factors are 
needed to explain the cross-products, 
and, consequently, the second residu- 
als differ irom zero only by rounding 
errors. Also, as would necessarily be 
the case, the sums of squares of ‘‘load- 
ings’ in rows of the factor matrix are 
identical to the original diagonal ele- 
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ments in the cross-product matrix 
(which are the sums of squared scores 
over the profile elements for each 
patient). 

By applying the orthogonal trans- 
formation shown in Table 2, a ro- 
tated factor solution is obtained. The 
clusters shown in Figure 1 and Table 1 
are clearly evidenced in the rotated 
factor solution, and the factor load- 
ings tell how much each patient be- 
longs to each factor. In Figure 2 the 
rotated factors are plotted, and the 
obtained set of interpoint distances is 
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identical to that shown in Figure 1. 
If one wants to cluster profiles, raw 
score factor analysis is a powerful and 
directly applicable procedure. 


How Raw Score Analysis Works 


Elements in profiles (e.g., the Para- 
noid scale of the MMPI) can be con- 
sidered as mutually orthogonal axes 
in Euclidaen space. Each profile can 
be ‘‘plotted’’ as a point in the space, 
and d measures the distance of points 
from one another. 


TABLE 2 


Raw Score Cross-Propucts AND FACTOR SOLUTION FOR PoINTts SHOWN IN FIGURE 1 











Cross-products 





Person 


Person 








Column sums 
First factor 





11 
37 
31 
32 
122 
4.65 





First residuals 





Person 


c d 





30 r 
72 
74 
38 | 
40 | 
m4) of 


9.57 
8.01 
6.11 
— 9.74 
— 7.84 
— 6.27 


12. 32 
| —10.24 
| — 7.84 
12.40 
10.00 
.92 
Column sums 
after reflexion 
Second factor 





60.72 
3.16 


36 | 
92 | 


47.54 
— 2.48 











Second residuals 








Person 
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TABLE 2—Continued 





Person 


Centroid factors 








—3.24 
—2.48 
3.92 
3.16 
2.51 





Transformation matrix 





B 





647 
.763 





Rotated factors 








Person 





Raw score factor analysis provides 
a basis (or semibasis) for the profile 
space. Because any sufficient basis 
preserves distances between points, 
the factor loadings preserve the orig- 
inal d’s. In the example worked out 
previously, this can be tested by ob- 
taining d’s from the two rotated fac- 
tors. This shows, for example, that 
the d between Persons a and b is, 
within the limits of rounding errors, 
1, which is what was given in Table 1. 
Similarly, all of the d’s can be calcu- 
lated from the factor matrix. If fac- 
toring is not complete, then the factor 
matrix will serve to explain the bulk 
of the original distances. 

The difficulty with most of the pro- 
posed measures of profile similarity is 
that they are non-Gramian,! e.g., 


1 By definition a Gramian matrix is one 
whose elements consist of cross-products 


(Hohn, 1958, p. 202). 





1.00 
6.00 
5.00 
5.00 


Cattell’s r, (1949), and, consequently 
powerful methods of multivariate 
analysis cannot be used. Of course, 
matrices of cross-products are neces- 
sarily Gramian, and powerful meth- 
ods of multivariate analysis, such as 


Cc 
° 
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LOADING ON FACTOR A 


Fic. 2. Loadings on rotated factors. 
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factor analysis, can be applied to 
them. Whenever there is a choice be- 
tween a number of descriptive meas- 
ures where one is Gramian and the 
others are not (e.g., the choice be- 
tween point biserial and biserial cor- 
relation), it greatly facilitates the 
analysis of results to choose the 
Gramian measure. 


Preparation for Analysis 


Much of the controversy about the 
analysis of profile data has concerned 
what, if any, transformations should 
be made before the data is analyzed. 
Regardless of what transformations 
are made, factor analysis of cross- 
product terms is a powerful method 
available to search for clusters. Two 
kinds of transformations have been 
proposed: transformations of distri- 
butions of individual differences on 
profile elements, and transformations 
of profiles as a function of intra- 
individual distributions. Each will 
be considered in turn. 

Individual differences. If the indi- 
vidual profile elements have grossly 
different standard deviations, some 
elements will contribute more to the 
interpoint distances than will others. 
For example, on the Rorschach, the 
number of F+ responses has a much 
larger standard deviation than the 
number of pure C responses, and, con- 
sequently, the former would more 
strongly influence the size of inter- 
point distances in a space of Ror- 
schach profiles. In many studies of 
profiles the elements have approxi- 
mately the same dispersions: MMPI, 
the Semantic Differential, and the 
subtests of the multifactor test bat- 
teries. When the standard deviations 
of profile elements are grossly dif- 
ferent, it is generally wise to equate 
them before using cross-products 
analysis to search for clusters. 

Profile elements differ not only in 
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terms of dispersions, they also differ 
in terms of their factor compositions. 
For example, profile elements on the 
Semantic Differential differ in terms 
of the factors of evaluation, potency, 
activity, and others. To the extent 
that one factor is more prominent 
than others in the collection of profile 
elements, that factor will more 
strongly influence the size of inter- 
point distances. One way to offset 
the differential influences of such 
factors is to factor analyze the profile 
elements (correlating over persons) 
and reduce the profiles to sets of fac- 
tor scores. Then the cross-products 
analysis of profiles can be made of the 
sets of factor scores. For example, in- 
stead of beginning with a space of 
interpoint distances formed by indi- 
vidual Semantic Differential scales, 
we can begin the analysis of profiles 
by constructing a space of Semantic 
Differential factor scores. Factor 
analysis of cross-products applies 
equally well in this situation, and it 
can be used regardless of the kinds of 
transformations that are made on 
profile elements. 

If the purpose of the analysis is to 
discriminate the typical profiles of 
two or more groups (Question 2, 
posed earlier), then nothing can be 
gained from transforming score dis- 
tributions on profile elements. The 
discriminant function will provide 
the same results whether or not the 
dispersions are equated. Also, the 
resolution of profile elements into 
factors cannot possibly add to the 
discriminability that would be ob- 
tained from a discriminant-function 
analysis of the elements themselves. 
However, a prior factor analysis of 
profile elements is sometimes wise 
because it simplifies the subsequent 
use of the discriminant function, 
leaves less opportunity for the dis- 
criminant function to ‘‘take advan- 
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tage of chance,”’ and usually makes 
the discriminant functions more in- 
terpretable. 

Intraindividual distributions. If, as 
some claim, profiles should be clus- 
tered with simultaneous respect to 
level, shape, and dispersion, then 
factor analysis should be made of raw 
cross-products, either on the un- 
transformed profile elements or after 
transformations of the kinds dis- 
cussed previously are made. 

If level is considered to be un- 
important in clustering profiles, then 
the means of all profiles should be 
equated before the analysis, prefera- 
bly equated to zero. Next form cross- 
product terms and divide each by the 
number of profile elements; then 
factor by any of the conventional 
methods. This is called covariance 
factor analysis, but it is only a special 
case of cross-products analysis. 

If both level and dispersion are con- 
sidered unimportant, convert all pro- 
files to standard scores, standardizing 
over the profile elements. Then form 
a matrix of cross-products and divide 
each term by the number of profile 
elements. This gives a correlation 
matrix, and no one needs to be told 
that it can be factor analyzed. 

If the purpose of the analysis is to 
discriminate the typical profiles of 
two or more groups (Question 2 posed 
earlier), it is an empirical question 
whether or not transformations of in- 
traindividual distributions will help 
or hinder the outcome in particular 
studies. For example, if in a particu- 
lar study all of the profiles are 
equated for /evel, this might increase 
discriminability or it equally well 
may lower discriminability. Conse- 
quently, before applying the dis- 
criminant function, it is wise to com- 
pare groups with respect to level, 
shape, and dispersion. If groups 
differ inconsequentially on any of the 
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components, it is wise to remove that 
component(s) before applying the 
discriminant function. 


SHOULD PROFILES BE ANALYZED? 


Most of this article is concerned 
with how to analyze profile data. 
Equally important is the initial de- 
cision in research to make compari- 
sons among score profiles. Perhaps in 
many situations it would be wiser not 
to make such comparisons at all. 

The decision to use profile analysis 
is determined in part by preferences 
for methodologies, which are, in es- 
sense, wagers about the likely research 
payoff in the long run from choosing 
one method of investigation rather 
than another. The reader can judge 
for himself whether the studies using 
comparisons of profiles (e.g., meas- 
ures of ‘“‘assumed similarity”’ in inter- 
personal perception) have borne the 
expected fruit. 

If analyses are made of the rela- 
tions among raw profiles, in which 
level, shape, and dispersion are pre- 
served, the results are often difficult 
to interpret. Particular results may 
be due to any one of the three profile 
components, and, without reanalyz- 
ing differently, there is no way to un- 
ravel the puzzle. Even those who 
initially advocated the analysis of 
raw profiles have since either advo- 
cated or practiced separate analyses 
of level, shape, and dispersion (for 
example, Cronbach, 1958). 

It was argued that factor analysis 
of cross-products is the best way to 
cluster profiles. However, when such 
analyses are made of raw profiles, it 
is sometimes difficult to interpret the 
results. Most of us have become so 
familiar with correlation coefficients, 
and factor analyses of them, that it 
raises some anxiety to look at factor 
loadings like —68.21 and 4.89. 

A good argument can be given for 
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the use of profile analysis in studies 
of personnel decisions, e.g., selecting 
men for a particular job, or classify- 
ing patients for different kinds of 
treatment. If criterion variables are 
available, the validity of any decision 
strategy based on profile analysis can 
be determined directly. Then the 
only sense in which it is necessary to 
justify the analysis of profiles is to 
show that it works better than some 
other approach. For example, it 
might be found that a discriminant- 
function analysis of score profiles is 
more effective in classifying mental 
patients than is a multiple regression 
approach. The major difficulty in 
validating profile analyses is that in 
many types of personnel decisions 
there are no adequate criteria avail- 
able, and the questions of whether to 
use profile analysis and, if so, how, 
are left moot. 

It is more difficult to argue for the 
use of profile comparisons in testing 
psychological theories. Many efforts 


have been made to assert hypotheses 
about interpersonal perception, psy- 
chotherapy, empathy, and others, in 
terms of profile similarities and dif- 


ferences. A major drawback to for- 
mulating such hypotheses is that they 
inevitably involve the semi-undefina- 
ble quality of ‘‘similarity.’’ Also, the 
general experience has been that the 
results of such studies often are much 
clearer when univariate comparisons 
rather than profile comparisons are 
made. This is illustrated in some of 
the studies of before-after therapy 
comparisons of profiles of ‘‘self’’ and 
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“‘ideal-self’’ ratings. What has gen- 
erally been found is that nearly 
everyone has the same “‘ideal”’ before 
therapy, and the ideal changes little 
during therapy. The change, if any, 
is in the self, and the change is mainly 
toward higher self-esteem (Rogers & 
Dymond, 1954, p. 417). Conse- 
quently, rather than assert vague and 
complex hypotheses about similari- 
ties among profiles before and after, 
it is much more meaningful to hy- 
pothesize that successful therapy 
raises self-esteem. Studies of inter- 
personal perceptions (for example, 
Bass & Fiedler, 1959) have also indi- 
cated that univariate comparisons 
often are more revealing than profile 
comparisons, 


SUMMARY 


Methods were suggested for han- 
dling three problems in the analysis 
of test profiles: measuring the simi- 
larity of profiles, discriminating the 
typical profiles of two or more groups, 
and clustering profiles into homoge- 
neous groups. The suggested meth- 
ods were, respectively: picturing pro- 
files as interpoint distances in Euclid- 
ean space, use of the linear multiple- 
discriminant function, and factor 
analysis of profile cross-product 
terms. Some suggestions were given 
about transformations of profile data 
before further analysis. Some opin- 
ions were stated about the appropri- 
ateness of profile analysis in studies 
of personnel decisions and in investi- 
gations of psychological theories. 
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The thesis of this paper is that sim- 
ple measures of the error for one- 
dimensional tracking, provided the 
right ones are used, can reveal the 
response strategy which the subject 
(S) adopts without involving an inor- 
dinate amount of work scoring rec- 
ords. The measures are the mean 
constant error (CE), and the standard 
deviation (SD) of the error indicating 
within-S variability, computed sep- 
arately for position and time at 
various points on the input. Sam- 
ples of the measures can be obtained 
relatively easily and quickly by hand 
from a record like that in Figure 1. 
When an electronic display is used it 
is possible to produce such a record 
by feeding the input and response 
into two channels of an oscillographic 
recorder, and subsequently super- 
imposing them. 

No special merit is attached to 
measuring by hand. Once it has been 
decided which are the best measures 
to use, it is possible either to build 
electronic devices to do the measur- 
ing, or to record performance on for 
example magnetic tape, and to feed 
the tape into a computer which is 
first programed to produce one meas- 
ure, and then programed to produce 
another (Webber & Adams, 1960). 
However, electronic devices and com- 
puter programs only answer the ques- 
tions which the experimenter (£) 
asks of them; they cannot tell him 
what new questions to ask. Unless E 


! The author is indebted to M. Stone for 
discussions on the statistical aspects of these 
problems, to J. A. Adams and E. J. Archer 
for constructive criticisms, and to the British 
Medical Research Council for financial sup- 
port. 
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has available almost unlimited re- 
sources for automatic data processing 
so that he can test out quite unlikely 
hypotheses, it may be advisable to 
make measurements by hand on 
sample records in order to be sure of 
not missing new and _ unexpected 
features. A simple clinical assess- 
ment from watching S perform or 
from serving as S may motivate E 
to make the necessary analyses, but 
is unlikely to tell him precisely what 
are the best measurements to make. 

The parallel approach using the 
describing functions of the engineer 
can be dismissed in a few words, 
since it is less relevant to psychology, 
and has been ably summarized by 
McRuer and Krendel (1957). De- 
scribing functions are based upon 
mathematical systems of analysis 
designed primarily to determine the 
numerical values of the parameters 
of servomechanisms. They are thus 
capable of giving exact numerical 
values to those aspects of human 
tracking performance which resemble 
the parameters of servomechanisms 
(Ellson, 1959). But they are not so 
suited as simple measures are for 
determining the details of the ways 
in which human operators do not 
behave like servomechanisms; these 
include most of the phenomena 
studied by psychologists (Adams, 
1961, pp. 56-60). 


SoME SIMPLE METHODS OF SCORING 
Overall Error in Position 


One of the simplest measures of 
performance is the mean error in 
position neglecting the sign. During 
World War II Craik and Vince 
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(1943, p. 5) recorded performance in 
one-dimensional tracking on a 
smoked drum moving at 50 milli- 
meters per minute, measured the error 
in position every 1.0 millimeter 
(1.2 seconds), and took the mean 
value. This is equivalent to drawing 
in Figure 1 a series of vertical lines 
connecting the two functions and 
computing their mean height. (If a 
search is to be made for high fre- 
quency components in the response, 
it is necessary to sample at a fre- 
quency at least double that of the 
highest component. But when using 
simple methods of scoring there is 
little point in sampling the error 
more frequently than once per second 
provided an adequate length of record 
is available. For the only valid statis- 
tical test of whether the results are 
representative of the population from 
which the Ss are drawn depends upon 
the differences between Ss; and there 
is a stage beyond which increasing 
the reliability of the means of in- 


dividual Ss makes little difference to 
the variability between Ss). 

If it is desired to increase the 
penalty upon large errors, a function 
of squared error such as mean 
squared error or root mean squared 
error may be used instead of mean 
error. If small errors do not matter 
at all, a “‘target’’ area of a selected 
size may be used and error be scored 
only beyond it. This is equivalent to 
increasing the width of the input 
function in Figure 1, and measuring 
the error from its edges. The meihod 
is somewhat analogous to measuring 
only the time for which S is outside 
the designated target area (see time 
on target below). 

A major disadvantage of using only 
a measure of the overall error in posi- 
tion neglecting the sign, is that it 
confounds what are probably true 
errors of position with errors of tim- 
ing. The heights of the shaded areas 
in Figure 1 can be looked upon as rep- 


resenting true errors in position, 
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since S should have stopped when he 
reached the points at which the input 
reversed direction, but instead 
stopped short of them or went on too 
far. In contrast, over most of the 
distance between reversals the error 
can be looked upon as more in the 
nature of error in timing, since S 
covered the right ground, but did so 
either too early or too late. Measures 
of the overall error in position neg- 
lecting its sign include both these 
rather different sources of error. 

A measure which is probably more 
or less uncontaminated by errors in 
timing can be obtained by averaging 
the errors in position taking their 
signs into account. This mean CE 
shows the extent to which the re- 
sponse is on average to one side or 
other of the input. Unfortunately 


it gives no indication of whether S 
tended to overshoot or undershoot at 
reversals, since adjacent overshoots 
tend to cancel out in the side-to-side 
dimension, and the same applies to 


adjacent undershoots. The mean CE 
is not very contaminated by errors in 
timing provided the input is on aver- 
age symmetrical with respect to time 
and position (as harmonic inputsare), 
and provided the same is approxi- 
mately true of the response. For 
under these conditions the CEs for 
time tend to cancel out in the posi- 
tion dimension, and vice versa. 


Overall Error in Time 


During World War II Helson 
(1949, p. 477) used measures of time 
error in one-dimensional tracking as 
well as measures of position error. 
Measuring the error in time gives a 
mean lag if the sign of the error is 
taken into account, in addition to a 
mean error when the sign is neg- 
lected. In pursuit tracking with a 
reasonably random harmonic input 
of high frequency which cannot be 
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seen in advance, the two measures 
tend to give similar values since the 
response is rarely ahead of the input 
under these conditions (Poulton, 
1954, Table 3, fast course). However 
with a random low frequency or sim- 
ple harmonic input the mean lag may 
be relatively small compared with 
the mean error. Again a function of 
squared error can be used instead of 
mean error if it is desired to increase 
the penalty upon large errors. 

Taking the mean error in time 
neglecting the sign is equivalent to 
drawing in Figure-1 a series of hori- 
zontal lines connecting the two func- 
tions, and computing their mean 
length. Integrating the error in time 
with respect to position in this way 
gives the same result as integrating 
the error in position with respect to 
time, except in so far as S overshoots 
or undershoots at the reversals in the 
direction of movement of the input. 
Figure 1 shows that overshoots are 
not taken into account in the time 
dimension, since there is no input to 
which they correspond. Similarly 
undershoots leave a loop of the input 
without a corresponding part of the 
response function. Thus the shaded 
areas in Figure 1, which can be 
looked upon as predominantly error 
in position, are included in the over- 
all error in position, but not in the 
overall error in time. 

Just before an undershot reversal 
such as R in Figure 1, S is typically 
much behind the input, whereas just 
after the reversal he is typically 
much ahead of it. Conversely in 
overshooting, which is represented by 
the broken line in Figure 1, S is 
typically first ahead of the input and 
then behind. The sign of the change 
in time error introduced by an under- 
shoot or overshoot before a reversal 
is the opposite of the sign of the 
change introduced after the re- 
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versal. Thus if the sign of the time 
error is taken into account, as in 
calculating the overall lag, the change 
in the error just before the reversal 
tends to cancel the change just after 
the reversal. Mean lag is thus not 
appreciably affected by overshooting 
or undershooting. This is not the 
case for the mean error in time neg- 
lecting the sign, unless S consistently 
lags further behind the input than the 
sizes of the changes in time error in- 
troduced by overshoots and under- 
shoots. It can never be the case for 
overall measures involving squared 
time errors, since (L+c)?+(L—c)? 
= 2L?; these measures are necessarily 
inflated by overshooting and under- 
shooting, even though they exclude 
the shaded areas in Figure 1. 

The failure to take account of the 
shaded areas in Figure 1 is a major 
disadvantage of using the overall 
error in time with sign neglected as 
the sole measure of performance in 
tracking inputs which reverse direc- 
tion. Time on target does not meet 
this particular difficulty. This meas- 
ure corresponds to increasing the 
thickness of the input function in 
Figure 1 to the width of the target 
area, and measuring the time for 
which the response line lies within its 
boundaries. However time on target 
takes no account of the size of the 
excursions from the target area, and 
its exact meaning has been questioned 
by Bahrick, Fitts, and Briggs (1957) 
on these and other grounds. 

Cross-correlation is a more sophis- 
ticated technique for determining the 
average time relationships between 
the input and response. This involves 
correlating the two after moving the 
response by each of a number of 
fixed steps along the time dimension 
(Merrill & Bennett, 1956). The size 
of step which the response has to be 
moved forward or backward along 
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the time dimension in order to give 
the largest correlation with the in- 
put indicates S’s overall lag or lead 
with respect to the input. 


Average Error at Particular Points on 
the Input 


With harmonic inputs average 
errors can be calculated at particular 
points such as reversals in direction, 
points of inflection, and points half 
way in time between reversals and 
points of inflection. Figure 1 shows 
that between reversals it is often im- 
possible to specify with any degree of 
certainty the corresponding points on 
the wiggly response record; thus the 
mean lag computed as described 
above is probably the most meaning- 
ful measure here. At the points of in- 
flection (I; and I, in Figure 1) the 
mean lag is probably more or less un- 
contaminated by errors in position- 
ing, since these points are placed 
symmetrically on the input. However 
at the points half way in time be- 
tween reversals and points of inflec- 
tion (M, and Mz, in Figure 1) a 
tendency to undershoot increases the 
mean lag before reversals and cor- 
respondingly reduces it afterwards. 
A tendency to overshoot typically 
has an opposite though smaller effect. 
To the extent that the effect of over- 
shooting does not fully balance that 
of undershooting, simple variability 
in the overshoot-undershoot dimen- 
sion should have an effect similar to 
undershooting, though less marked. 

A reversal on the input (Point R in 
Figure 1) can be compared directly 
with the corresponding reversal on 
the response function (Point r or 
r'), At these points it is thus possible 
to obtain four separate measures: a 
CE and an SD of error related to the 
position of the response irrespective 
of its timing, and two similar meas- 
ures related to the timing of the re- 
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sponse without regard to its position. 
The sizes of the overshoots and un- 
dershoots (the heights of the shaded 
areas in Figure 1) probably provide 
the most useful measures in the posi- 
tion dimension. An alternative ver- 
sion of the CE in position, which 
shows only the extent to which the 
response is on average too far to one 
side or other of the input, is less rel- 
evant to the primary problem fac- 
ing S. 


An Illustration 


Table 1 shows results from an as 
yet unpublished experiment to illus- 
trate some of the more useful simple 
methods of scoring. Each entry in the 
table is based upon only 120 measure- 
ments, 10 from each of 12 Ss. The 
complete data for the Preview version 
thus required only 840 measure- 
ments, and the same is true of the Slit 
version. This has been done de- 
liberately in order to minimize labor, 
and to show how much can be dis- 
covered in spite of this. More gener- 
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ous sampling calls if possible for 
automatic methods of scoring. 


METHOD 


Apparatus. An irregular curved input, of 
which Figure 1 shows a sample, was drawn on 
a paper tape which moved towards S at a 
rate of 1.0 inch per second. The frequencies 
in the input were 26 cycles per minute, 21 
cycles per minute, and a component of 10.5 
cycles per minute which had twice the 
amplitude of the other two. The maximum 
amplitude of movement of the input was 1.75 
inches. A ball-point pen was used as a 
stylus. The stylus could be moved in a slit ly- 
ing over the paper tape at right angles to it. 

Task. For the data in Table 1 S had to keep 
the stylus on the input by moving it from side 
to side in the slit. In the Preview version he 
could see the input 2.5 seconds (2.5 inches) 
ahead of the stylus, as a walker can normally 
see the footpath ahead. In the Slit version he 
could see the input only in the slit in which 
the stylus moved. The slit had a width of .1 
inch. 

Procedure. Each trial lasted 30 seconds. 
Half the Ss did the Preview version first, and 
half the Slit version. Data was also collected 
when a gap separated the input from the 
stylus, and S had to keep the two aligned, 
but it is not shown in Table 1. Practice was 
deliberately restricted, so that for the results 
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Note.—L and R indicate that the response was too far to the left or right, while U and O indicate a tendency to 
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® Overall sample—Reversals p <.U5 or better. 
Preview—Slit » <.05 or better. 
® Reversals—Points of inflection p <.05 or better. 


he mean CEs for time were all lags. 
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in the table'each S had performed for alto- 
gether only between 2.0 and 5.0 minutes on 
each version. The amounts of practice on 
each version were counterbalanced between 
Ss. 

Subjects. These were 12 young enlisted men 
in the British Royal Navy, none of whom had 
done much tracking. 

Scoring. Each mean in the table is based 
upon 10 measurements from the record of 
each S. The measures at reversals, at points 
of inflection, and at the two series of points 
midway in time between reversals and points 
of inflection came from the second half of the 
30-second trial. At the three latter sets of 
points on the input the time error corresponds 
to the horizontal distance in Figure 1 between 
the two functions. At reversals the time and 
position error are, respectively, the difference 
between when and where the input reversed 
direction and when and where the response did 
so. Where S stopped for an appreciable time 
before moving off again in the opposite direc- 
tion, as near r in Figure 1, the time error is 
computed from the average of the time at 
which he stopped and the time at which he 
moved off again. 

The measures in the overall sample rep- 
resent performance averaged over all points on 
the input. The sample is based upon 10 points 
separated by 1.6 seconds, also from the second 
half of the trial. (This periodicity is not re- 
lated in a simple way to any of the three fre- 
quency components of the input, and makes 
the sample cover approximately the same 
length of record as the samples at particular 
points on the input.) Where an error in time 
could not be measured at the selected point be- 
cause S had undershot, the next point was 
taken instead. 

Calculations. The means SJ)s show the 
variability within Ss, the SEs the variability 
between Ss. Thus the SE of an SD indicates 
the size of the individual differences in vari- 
ability. The reliability of the differences be- 
tween means was assessed by two-sample ¢ 
tests with 11 degrees of freedom, using two 
tails. 


RESULTS AND DISCUSSION 

It has been suggested above that 
performance at reversals probably 
provides the most useful measures of 
the errors in position. Table 1 shows 
that in the Preview version S tended 
to undershoot by an average of .41 
millimeters (pb <.002), whereas in the 
Slit version he tended to overshoot by 
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an average of 1.13 millimeters 
(p<.01). The SDs of the errors in 
position at reversals were over twice 
as large in the Slit version as with 
Preview (p<.001). In the overall 
sample, which shows performance 
averaged over all points on the input, 
the mean CE in position is unlikely to 
be very contaminated by time error, 
but the SD of the error in position 
necessarily contains an unknown 
component of time error. There was 
no significant difference between the 
overall sample and reversals in the 
extent to which the response was on 
average to one side or other of the 
input (p<.05), but as expected the 
SDs of the overall sample were sig- 
nificantly too large in both versions 
as compared with the SDs at re- 
versals (p <.01). 

As already indicated, the mean 
CEs in time at reversals are not con- 
taminated by errors in positioning, 
and the mean CEs of the overall sam- 
ple and at the points of inflection are 
unlikely to be very contaminated. 
The combined means of the CEs 4 
cycle before and after reversals are 
also unlikely to be very contami- 
nated, but a tendency to undershoot, 
and even simple variability in posi- 
tioning at reversals, is likely to in- 
crease the mean time lags before re- 
versals, ‘and to reduce correspond- 
ingly the mean lags after reversals. 
Table 1 shows that in the Preview 
version the mean time lag was twice 
as great at reversals as it was at the 
points of inflection on the input 
(p<.05). The average time lag at the 
intermediate points on the input } 
cycle before and after revesals, .47 
seconds, lay intermediately. Thus in 
the Preview version S did not simply 
reproduce the input as accurately 
as he could with a constant time lag; 
his timing varied significantly at 
different points on the input cycle. 
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The nature of this nonlinear relation- 
ship would not have been so easy to 
determine using the describing func- 
tions of the engineer, since servo- 
mechanisms do not normally behave 
like this (see the introduction). 

In the Slit version there were no 
significant differences between the 
mean time lags at different points on 
the input (p>.05). If differences 
exist, larger samples of data are re- 
quired in order to reveal them. None 
of the SDs of the error in time is an 
adequate measure of the overall 
variability in the time dimension. 
The SDs at reversals are likely to be 
smaller than the true overall vari- 
ability in timing, since only one 
point on the input is represented. 
However, unlike the remaining SDs 
in time in Table 1, the SDs at re- 
versals are not contaminated by 
variability in the position dimension; 
thus it is possible to make a valid 
comparison of the variability in tim- 
ing at these points between the two 
versions. Table 1 shows that there 
was significantly more variability in 
timing at reversals in the Slit version 
(p<.01), although the size of the 
difference was not large. 


Two Different Strategies 


The effect of size of preview was 
investigated in a previous experiment 
using the same input and apparatus 
as used here. Overall performance 
was found to change markedly when 
the preview was increased from zero 
(as in the Slit version) to .4 second; 
but there was no significant further 
change when the preview was ex- 
tended from .4 to 8.0 seconds (Poul- 
ton, 1954, p. 406). The 2.5-second 
preview used here was chosen to be 
well on that part of the function 
where overall performance had ceased 
to change appreciably with increase 
in preview. The differences between 
the Slit and Preview versions in Table 
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1 can thus be taken to represent the 
maximum effect which a change in 
preview is likely to produce. By com- 
paring the two versions it is possible 
to indicate the response strategy 
adopted in each case, and thus to 
demonstrate the usefulness of the 
simple measures given in the table. 

In the Preview version Table 1 
shows that the mean SD of the error 
in position at reversals was less than 
half the size of that in the Slit ver- 
sion. In addition the mean lag was 
twice as large at reversals as at the 
points of inflection on the input, 
whereas there was little difference in 
the Slit version. Thus in attempting 
to keep as much as possible on the 
input, S adopted the strategy in the 
Preview version of minimizing over- 
shoots and undershoots at reversals 
by approaching them more slowly 
than did the input itself, and catching 
up again at the start of the return 
movement. 

The stimulus conditions which 
presumably determined this strategy 
were as follows: close to the points of 
inflection on the input a small error 
in timing produced a considerable 
misalignment, since here the input 
was moving at its maximum velocity 
(see Figure 1). In contrast, close to 
reversals even a relatively large error 
in timing did not produce much 
misalignment provided the ampli- 
tude of the response was correct, since 
the input was more or less stationary; 
whereas an overshoot or undershoot 
not only produced a misalignment 
proportional to its size, but the mis- 
alignment tended to remain for quite 
a time, since both input and response 
moved so slowly here (see Figure 1). 
Misalignment was thus minimized by 
concentrating upon correct timing 
near the points of inflection, and 
upon correct positioning near re- 
versals. 

In the Slit version Table 1 shows 
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that the SD of the error in position 
at reversals was over twice as large 
as in the Preview version. Also S 
overshot by an average of 1.13 milli- 
meters, as compared with a mean 
undershoot of only about one third 
the size in the Preview version. In 
the time dimension the mean lag in 
the overall sample was almost three 
times as large in the Slit version as 
with Preview, although it was by no 
means as long as a visual reaction 
time (RT) which is usually given as 
about .18 second (Woodworth, 1938, 
p. 324). In addition, at least at re- 
versals, the timing was rather more 
variable in the Slit version, although 
there was less difference than in the 

review version between the mean 
lag at one point on the input and 
another. 

The Slit version presented what 
was effectively an insoluble problem: 
S had to keep up with an irregular in- 
put which he could not see in ad- 


vance. In so far as he attempted to 
compensate for his RT he had there- 
fore to act on his predictions as to 
what the input was about to do, and 
thus to risk overshooting when the 
input stopped and reversed direction 


unexpectedly, and  undershooting 
when the input went on further than 
he expected. In a typical RT experi- 
ment his behavior would produce so- 
called ‘‘premature” or ‘‘false’’ re- 
actions. Faced with this problem, S 
adopted a strategy which was a com- 
promise between on the one hand 
keeping up with the input regardless 
of overshooting and undershooting 
at reversals, and on the other hand of 
minimizing overshoots and under- 
shoots by following a full RT behind 
the input. 
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SUMMARY 


For one-dimensional tracking sim- 
ple measures are described which can 
be made in terms of both position 
and time. The measures may be 
averaged over all points on the in- 
put, or may be averaged for only one 
kind of point; e.g., at the reversals in 
direction, or at the points of inflection 
on the input. At reversals it is 
possible to score the error between 
the corresponding points on the in- 
put and response functions, and thus 
to produce measures of error in posi- 
tioning which are uncontaminated by 
errors in timing and vice versa. 

Overshoots and undershoots are 
probably the most relevant errors of 
positioning. The extent to which the 
response is on average to one side or 
other of the input, the overall lag or 
lead, and the mean lag or lead at the 
points of inflection on the input and 
at points situated symmetrically on 
either side of it, can all probably be 
computed in a reasonably uncon- 
taminated form. Most other meas- 
ures described confound more seri- 
ously errors of positioning with errors 
of timing. 

Some unpublished data are used to 
illustrate the various measures. They 
show a previously unreported rela- 
tionship which would not have been 
so easy to specify using the describ- 
ing functions of the engineer. From 
the data it is possible to distinguish 
two different response strategies, 
which can be related to differences in 
stimulus conditions. The analysis 
demonstrates the increased insight 
into the stimuli influencing S in 
tracking, and into the strategies 
adopted, which can come from simple 
methods of scoring involving only a 
limited number of measurements. 
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The heterogeneity of schizophrenic 
patients and the lack of success in 
relating variable schizophrenic func- 
tioning to diagnostic subtypes (King, 
1954) have indicated the serious 
limitations of the current neuro- 
psychiatric classification of schizo- 
phrenia. In response to these 
limitations interest has arisen in a 
two-dimensional frame of reference 
for schizophrenia. Such a conception 
is based on the patient’s life history 
and/or prognosis. A number of 
terms—malignant-benign, dementia 
praecox-schizophrenia, chronic-ep- 
isodic, chronic-acute, typical-atyp- 
ical, evolutionary-reactive, true- 


schizophreniform, process-reactive 


have appeared in the literature de- 
scribing these two syndromes. Proc- 
ess schizophrenia involves a long-term 
progressive deterioration of the ad- 
justment pattern with little chance 
of recovery, while reactive schizo- 
phrenia indicates a good prognosis 
based on a history of generally ade- 
quate social development with nota- 
ble stress precipitating the psychosis. 

In view of the current favorable 
interest in this approach to the un- 
derstanding of schizophrenia (Rabin 
& King, 1958) the present investiga- 
tion is designed as an evaluative re- 
view of the literature on the process- 
reactive classification. 


EARLY PROGNOSTIC STUDIES 


The process-reactive distinction 
had its implicit origin in the work of 
Bleuler (1911). Prior to this the 
Kraepelinian influence had prevailed, 
with dementia praecox considered an 
incurable deteriorative disorder. 


Bleuler, while adhering to an organic 
etiology for schizophrenia, nonethe- 
less observed that some cases re- 
covered. This conclusion opened the 
field to a series of subsequent prog- 
nostic studies (Benjamin, 1946; 
Chase & Silverman, 1943; Hunt & 
Appel, 1936; Kant, 1940, 1941, 1944; 
Kretschmer, 1925; Langfeldt, 1951; 
Lewis, 1936, 1944; Malamud & 
Render, 1939; Mauz, 1930; Milici, 
1939; Paskind & Brown, 1940; Witt- 
man, 1941, 1944; Wittman & Stein- 
berg, 1944a, 1944b) eventuating in 
formalized descriptions of the process 
and reactive syndromes in terms of 
specific criteria. 

These early studies can be classified 
in three general catagories: studies 
correlating the outcome of a specific 
type of therapy with certain prog- 
nostic variables, studies descriptively 
evaluating prognostic criteria, and 
studies validating a prognostic scale. 

The first category is illustrated by 
the attempt of Chase and Silverman 
(1943) to correlate the results of 
Metrazol and insulin shock therapy 
with prognosis, using 100 schizo- 
phrenic patients treated with Metra- 
zol and 40 schizophrenic patients 
treated with insulin shock. 

In the first part of this study the 
probable outcome of each of the 150 
patients was estimated on the basis 
of prognostic criteria. The criteria 
considered of primary importance for 
a favorable prognosis were: short 
duration of illness, acute onset, ob- 
vious exogenic precipitating factors, 
early prominence of confusion, and 
atypical symptoms (marked by 
strong mixtures of manic-depressive, 
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psychogenic, and symptomatic 
trends), and minimal process symp- 
toms (absence of depersonalization, 
derealization, massive primary per- 
secutory ideas, and sensations of in- 
fluence, conscious realization of per- 
sonality disintegration, bizarre delu- 
sions and hallucinations, marked 
apathy, and dissociation of affect). 
When these conditions were reversed 
the prognosis was least favorable. 
The following factors were considered 
less important for a favorable prog- 
nosis: history of previous illness, 
pyknic body type, extrovert tempera- 
ment and adequate prepsychotic life 
adjustment, catatonic and atypical 
subtypes. Asthenic body type, intro- 
version, inadequacy of prepsychotic 
reactions to life situations, onset of 
illness after the age of 40, and 
hebephrenic and paranoid subtypes 
were considered indicative of un- 
favorable prognosis. Age of onset 
under 40, sex, education, and abil- 


ities, and hereditary background were 
not considered of prognostic impor- 
tance. An analysis of the prognosti- 
cally significant factors resulted in 
the evaluation of the prognosis for 
each case as good, fair, or poor. 


Following termination of shock 
treatment all patients were followed- 
up for an average of 10 months and 
divided into three groups; much im- 
proved, improved, and unimproved. 
A comparison of the prognostic 
assessments with the results of shock 
indicated that of 43 cases in which 
the prognosis was considered good, 
33 showed remissions, while of 74 
cases with a poor prognosis, 63 did 
not improve. It was concluded that 
shock therapies were effective in 
cases of schizophrenia in which the 
prognosis was favorable, but were of 
little value when the prognosis was 
poor. 

The second part of the research in- 
volved a reanalysis of the prognostic 
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criteria in the light of the results of 
shock treatment. Short duration of 
illness and the absence of process 
symptoms were the most significant 
factors for favorable outcome, while 
long duration of illness (more than 2 
years) and the presence of process 
symptoms were primary in determin- 
ing poor prognosis. 

A descriptive review of prognostic 
factors is seen in Kant’s (1944) de- 
scription of the benign (reactive) 
syndrome as cases in which clouding 
and confusion prevail, or in which 
the schizophrenic symptoms centered 
around manic-depressive features 
or cases with alternating states of 
excitement and stupor with frag- 
mentation of mental activity. Malig- 
nant (process) cases are characterized 
by direct process symptoms. These 
include changes in the behavior lead- 
ing to disorganization, dulling and 
autism, preceding the outbreak of 
overt psychosis. The most subtle 
manifestation of this is the typical 
schizophrenic thought disturbance. 
The patient experiences the process as 
a loss of normal feeling of personal- 
ity activity and the start of exper- 
iencing a foreign influence applied to 
mind or body. 

The third category includes the 
Elgin Prognostic Scale, constructed 
by Wittman (1941) to predict re- 
covery in schizophrenia. It is com- 
prised of 20 rating scales weighted 
according to prognostic importance: 
favorable factors are weighted nega- 
tively, and unfavorable factors are 
assigned positive weights. Initial 
validation involved 343 schizophrenic 
cases placed on shock treatment. 
Wittman and Steinberg (1944a) per- 
formed a follow-up study on 804 
schizophrenics and 156 manic-de- 
pressive patients. The Elign scale 
proved effective in predicting the 
outcome of therapy in 80-85% of the 
cases in both studies, and has been 
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utilized in the work of Becker (1956, 
1959), King (1958), and McDonough 
(1960) to distinguish the process- 
reactive syndrome. Included in the 
subscales of the Elgin scale are 
evaluations of prepsychotic personal- 
ity, nature of onset, and typicality 
of the psychosis relative to Krae- 
pelin’s definition. 


STUDIES WITH DETAILED PROCESS- 
REACTIVE CRITERIA 


The synthesis of early studies is 
found in the research of Kantor, 
Wallner,and Winder (1953) establish- 
ing detailed criteria for distinguishing 
the two syndromes on the basis of 
case history material. A process 
patient would exhibit the following 
characteristics: early psychological 
trauma, severe or long physical ill- 
ness, odd member of the family, 
school difficulties, family troubles 
paralleled by sudden changes in the 
patient’s behavior, introverted be- 
havior trends and interests, history 
of a breakdown of social, physical, 
and/or mental functioning, patho- 
logical siblings, overprotective or re- 
jecting mother, rejecting father, lack 
of heterosexuality, insidious gradual 
onset of psychosis without pertinent 
stress, physical aggression, poor re- 
sponse to treatment, lengthy stay in 
the hospital, massive paranoia, little 
capacity for alcohol, no manic-de- 
pressive component, failure under 
adversity, discrepancy between abil- 
ity and achievement, awareness of a 
change in the self, somatic delusions, 
a clash between the culture and the 
environment, and a loss of decency. 
In contrast, the reactive patient has 
these characteristics: good psycho- 
logical history, good physical health, 
normal family member, well adjusted 
at school, domestic troubles unaccom- 
panied by behavioral disruptions in 
the patient, extroverted behavior 
trends and interests, history of ade- 
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quate social physical, and/or mental 
functioning, normal siblings, nor- 
mally protective accepting mother, 
accepting father, heterosexual be- 
havior, sudden onset of psychosis 
with pertinent stress present, verbal 
aggression, good response to treat- 
ment, short stay in the hospital, 
minor paranoid trends, good capacity 
for alcohol, manic-depressive com- 
ponent present, success despite ad- 
versity, harmony between ability and 
achievement, no sensation of self- 
change, absence of somatic delusions, 
harmony between the culture and the 
environment, and retention of de- 
cency. 

The first three criteria apply to the 
patient’s behavior between birth and 
the fifth year; the next seven, be- 
tween the fifth year and adolescence; 
the next five, from adolescence to 
adulthood; the last nine, during 
adulthood. Using these 24 points to 
distinguish the two syndromes they 
tried to answer three questions: 

1. Do diagnoses based upon the 
Rorschach alone label as nonpsychot- 
ic a portion of the population of 
mental patients who are clinically 
diagnosed as schizophrenic? 

2. Can case histories of clinically 
diagnesed schizophrenics be differ- 
entiated into two categories: process 
and reactive? 

3. Are those cases rated psychotic 
from the Rorschach classed as process 
on the basis of case histories, and are 
those cases judged nonpsychotic from 
the Rorschach classified as reactive 
from the case histories? 

Two samples of 108 and 95 patients 


clinically diagnosed as schizophrenic 
were given the Rorschach and rated 


according to the process-reactive 
criteria. In the first sample of 108 
patients, 57 were classified as psychot- 
ic and 51 nonpsychotic on the basis 
of the Rorschach alone, while in the 
second sample, of 74 patients who 








332 


could be rated as process or reactive, 
36 were classified as psychotic, and 
38 as nonpsychotic from their Ror- 
schach protocols. Those patients who 
were rated as reactive from their 
history were most often judged non- 
psychotic from the Rorschach, and 
those rated process from the case 
histories were most often judged as 
psychotic from the Rorschach. 

Only one judge was used in the 
second sample to rate the patients as 
process or reactive, but two judges 
were used in the first sample. Of the 
108 patients in this sample, both 
judges rated 86 cases, and were in 
agreement on 64 of these, which is 
greater than would be expected by 
chance. 

However, the accuracy of the 
schizophrenic diagnosis is question- 
able in this study. If the Rorschach 
diagnosis is followed, then it appears 
that reactive schizophrenics are not 
psychotic. Furthermore, the psychiat- 


ric diagnosis appears to be somewhat 
contaminated because it was estab- 
lished on the basis of data collected 


by all appropriate services of the 
hospital, including psychological ex- 
aminations. A similar type of con- 
tamination may have been present in 
classifying patients as process or re- 
active because one judge had re- 
viewed each case previously and had 
seen psychological examination and 
history materials together prior to 
making his ratings. Three difficulties 
can be found with the criteria for 
process-reactive ratings. First, case 
histories are often incomplete and 
the patient is unable or unwilling to 
supply the necessary information. 
Second, it is difficult to precisely 
apply some of the criteria. For ex- 
ample, what is the precise dividing 
line between oddity and normality 
within the family? Third, in order to 
classify a patient it is necessary to set 
an arbitrary cut off point based on 
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the number of process or reactive 
characteristics a patient has. Sucha 
procedure needs validation. 

Nonetheless, the results of this 
study support the view that schiz- 
ophrenics can be classified as 
process or reactive, and that these 
syndromes differ in psychological 
functioning. 

Another rating scale which has 
been used extensively to distinguish 
prognostically favorable and prog- 
nostically unfavorable schizophrenics 
was developed by Phillips (1953). 
The scale was developed from the 
case histories of schizophrenic pa- 
tients who were eventually given 
shock treatment. The scale evaluates 
each patient in three areas: premorbid 
history, possible precipitating factors, 
and signs of the disorder. Premorbid 
history includes seven items on the 
social aspects of sexual life during 
adolescence and immediately beyond, 
seven items on the social aspects of 
recent sexual life, six items on per- 
sonal relations, and six items on re- 
cent premorbid adjustment in per- 
sonal relations. The sections of the 
scale which reflect the recent sexual 
life and its social history are the most 
successful in predicting the outcome 
of treatment. The items in the scales 
are arranged in order of increasing 
significance for improvement and 
nonimprovement away from the score 
of three, which is the dividing point 
between improved and unimproved 
groups. The premorbid history sub- 
scale has been utilized as the ranking 
instrument in the studies described 
by Rodnick and Garmezy (1957; 
Garmezy & Rodnick, 1959). 

Another approach to the separa- 
tion of schizophrenics into prognostic 
groups uses the activity of the auto- 
nomic nervous system as the basis for 
division (Meadow & Funkenstein, 
1952; Meadow, Greenblatt, Funken- 
stein, & Solomon, 1953; Meadow, 
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Greenblatt, & Solomon, 1953). 
Meadow and Funkenstein (1952) 
worked with 58 schizophrenic pa- 
tients tested for autonomic reactiv- 
ity and for abstract thinking. Fol- 
lowing therapy the patients were 
divided into two groups, good or poor, 
depending on the outcome of the 
treatment. The battery of psycho- 
logical tests included the similarities 
and block design subtests of the 
Wechsler-Bellevue scale, the Benja- 
min Proverbs test, and the object 
sorting tests. The physiological test 
involved the systolic blood pressure 
reaction to adrenergic stimulation 
(intravenous Epinephrine) and cho- 
linergic stimulation (intramuscular 
Mecholyl). On the basis of the physi- 
ological and psychological testing, 
schizophrenic cases were divided into 
three types: Type I, characterized by 
marked response to Epinephrine, low 
blood pressure, and failure of the 
blood pressure to rise under most 
stresses, loss of ability for abstract 
thinking, inappropriate affect, and a 
poor prognosis; Type II, character- 
ized by an entirely different auto- 
nomic pattern, relatively intact ab- 
stract ability, anxiety or depression, 
and a good prognosis; Type III, 
showing no autonomic disturbance, 
relatively little loss of abstract abil- 
ity, little anxiety, well organized 
paranoid delusions, and a fair prog- 
nosis. 

However, as Meadow and Funken- 
stein (1952) point out, there is con- 
siderable overlap of the measures 
defining these types so that the classi- 
fication must be tentative. Also, of 
the psychological tests used, only 
Proverbs distinguished significantly 
between the patients when they were 
classified according to autonomic 
reactivity, while Block Design failed 
to distinguish significantly among 
any of the types. Further research us- 
ing this method of division (Meadow, 
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Greenblatt, Funkenstein, & Solomon, 
1953; Meadow, Greenblatt, & Solo- 
mon, 1953) served as a basis for in- 
vestigations of the process-reactive 
syndromes by King (1958) and 
Zuckerman and Grosz (1959). 

King (1958) hypothesized that 
predominantly reactive schizophren- 
ics would exhibit a higher level of 
autonomic responsiveness after the 
injection of Mecholyl than predomi- 
nantly process schizophrenics. The 
subjects were 60 schizophrenics who 
were classified as either process or 
reactive by the present investigator 
and an independent judge using the 
criteria of Kantor et al. (1953). Only 
those subjects were used on which 
there was classificatory agreement. 
This resulted in 22 process and 24 
reactive patients. In order to con- 
sider the process-reactive syndrome 
as a continuum, 16 subjects were 
randomly selected from these two 
groups and were ranked by two inde- 
pendent raters. 

While the patient was lying in bed 
shortly after awaking in the morning 
the resting systolic blood pressure was 
determined. The patient then re- 
ceived 10 milligrams of Mecholyl 
intramuscularly, and the systolic 
blood pressure was recorded at in- 
tervals up to 20 minutes. Then the 
maximum fall in systolic blood pres- 
sure (MFBP) below the resting blood 
pressure following the injection of 
Mecholyl was computed for the 
different time intervals. There was a 
significant difference in the MFBP 


score for the reactives as compared 


with the normals. For the 16 sub- 
jects, the correlation between the sets 
of ranks on the process-reactive di- 
mension and MFBP was —.58. 

In a second part of the study 90 
schizophrenics, none of whom had 
participated in the first part, were 
classified as either process, process- 
reactive, or reactive, using the cri- 
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teria of Kantor et al. (1953). On this 
basis the subjects were divided into 
three groups of 24. Also, scores for 22 
subjects were obtained on the Elgin 
Prognostic Scale, and 12 of these were 
rated independently by two raters. 
The MFBP scores were 17.04 for the 
process group, 22.79 for the process- 
reactive group, and 26.62 for the 
reactive. Using an analysis of vari- 
ance a significant F score occurs at 
the .01 level. The correlation between 
the Elgin Prognostic Scale and the 
MFBP scores for 22 patients was 
— .49, 

Results of both parts of the study 
revealed that the patients classified 
as reactive exhibited a significantly 
greater fall in blood pressure after the 
administration of Mecholyl than the 
process patients. This evidence 
points to diminished physiological 
responsiveness in process, but not in 
reactive schizophrenia. However, 
Zuckerman and Grosz (1959) found 


that process schizophrenics showed a 
significantly greater fall in blood pres- 
sure following the administration of 


Mecholyl than reactives. Since these 
results contradict King’s findings the 
question of the direction of respon- 
siveness to Mecholyl in these two 
groups requires further investigation 
before a conclusion can be reached. 


PROcCESS-ORGANIC VERSUS RE- 
ACTIVE-PsYCHOGENIC 

Brackbill and Fine (1956) sug- 
gested that process schizophrenics 
suffer from an organic impairment 
not present in the reactive case. They 
hypothesized that there would be no 
significant differences in the inci- 
dence of “organic signs’’ on the 
Rorschach between a group of proc- 
ess schizophrenics and a group of 
known cases of central nervous sys- 
tem pathology, and that both organic 
and process groups would show sig- 
nificantly more signs of organic in- 
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volvement than the reactive group. 

The subjects consisted of 36 pa- 
tients diagnosed as process schizo- 
phrenics and 24 reactive schizophren- 
ics. The criteria of Kantor et al. 
(1953) were used to describe the pa- 
tients as process or reactive. Pa- 
tients were included only when there 
was complete agreement between 
judges as to the category of schizo- 
phrenia. Also included in the sample 
were 28 cases of known organic in- 
volvement. All patients were given 
the Rorschach,and the protocols were 
scored using Piotrowski’s (1940) 10 
signs of organicity. 

Using the criterion of five or more 
signs as a definite indication of or- 
ganic involvement there was no sig- 
nificant difference between the or- 
ganic and process groups, but both 
groups were significantly different 
from the reactives. Considering 
individual signs, four distinguished 
between the reactive and organic 
group, while two distinguished be- 
tween process and reactive groups. 
The authors concluded that the re- 
sults supported the hypothesis that 
process schizophrenics react to a 
perceptual task in a similar manner 
to that of patients with central ner- 
vous system pathology. No specific 
hypothesis was made about individ- 
ual Rorschach signs, but color nam- 
ing, completely absent in the reac- 
tives, was indicated as an example of 
concrete thinking and inability to 
abstract, suggesting that one of the 
critical differences between process 
and reactive groups is in terms of a 
type of thought disturbance. 

This study does not provide de- 
tailed information about the manner 
of establishing the diagnosis of schizo- 
phrenia-or about the judges deciding 
the process and reactive syndromes. 
Also, a further difficulty is the ad- 
mitted inadequacy of the organic 
signs, since 66% of cases with organic 
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pathology in this study were false 
negatives according to the Rorschach 
criteria. Thus while the existence of 
the process and reactive syndromes is 
supported by the results of this in- 
vestigation, there is less evidence of 
an organic deficit in process schizo- 
phrenics. 

Becker (1956) pointed out that the 
consistency of the prognostic findings 
in schizophrenia has led to postulat- 
ing two kinds of schizophrenia: proc- 
ess, with an organic basis, and reac- 
tive, with a psychological basis. He 
rejects this conclusion because re- 
search data in this area shows con- 
siderable group overlap, making it 
clinically difficult and arbitrary to 
force all schizophrenics into one 
group or the other. Also, if schizo- 


phrenia is a deficit reaction which 
may be brought about by any com- 
bination of 40 or more etiological 
factors, then the conception of two 
dichotomous types of schizophrenia 


is not useful. Finally, he maintains 
that 20 years of research have failed 
to find clear etiological differences 
between any subgroupings. 

Instead, Becker stated that process 
and reactive syndromes should be 
conceived as end points on a con- 
tinuum of levels of personality or- 
ganization. Process reflects a very 
primitive undifferentiated personality 
structure, while reactive indicates a 
more highly organized one. He hy- 
pothesized that schizophrenics more 
nearly approximating the process 
syndrome would show more regres- 
sive and immature thinking proc- 
esses than schizophrenics who more 
nearly approximate the reactive syn- 
dromes. His sample consisted of 51 
schizophrenics, 24 males and 27 fe- 
males, all under 41 years of age. 
Their thinking processes were evalu- 
ated by the Rorschach and the 
Benjamin Proverbs test. The 1937 
Stanford-Binet vocabulary test was 


335 


used to estimate verbal intelligence. 
A Rorschach scoring system was used 
which presumably reflected the sub- 
jects’ level of perceptual develop- 
ment, while a scoring system was 
devised for the Proverbs which re- 
flected levels of abstraction. Since 
there is a high relationship between 
intelligence and ability to interpret 
proverbs, a more sensitive index of a 
thinking disturbance was considered 
to be a discrepancy score based on 
the standard score difference between 
a vocabulary estimate of verbal in- 
telligence and the proverbs score. 
Process and reactive ratings were 
made on the Elgin Prognostic Scale. 

The Rorschach mean perceptual 
level score and the Elgin Prognostic 
Scale correlated —.599 for men and 
—.679 for women, indicating a sig- 
nificant relationship between the 
process-reactive dimension as evalu- 
ated from case history data and 
disturbances of thought processes as 
measured by the Rorschach scoring 
system. The proverbs-vocabulary 
discrepancy score was significantly 
related to the process-reactive dimen- 
sion for men, but not for women. No 
adequate explanation was found for 
this sex difference, which mitigates 
the results. A further difficulty oc- 
curs because the case history and test 
evaluations were made by the same 
person. However, the results in part 
support the hypothesis, indicating 
evidence for a measurable dimension 
of regressive and immature thinking 
related to the process-reactive di- 
mension. 

McDonough (1960), acting on the 
assumption that process _ schizo- 
phrenia involves central nervous sys- 
tem pathology specifically cortical in 
nature, hypothesized that brain dam- 
aged patients and process schizo- 
phrenics would have significantly 
lower critical flicker frequency (CFF) 
thresholds and would be unable to 
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perceive the spiral aftereffect signifi- 
cantly more often than reactive schiz- 
ophrenics and normals. Four groups 
of 20 subjects each were tested. The 
organic group consisted of individuals 
with known brain damage. One 
hundred and sixty-one schizophrenic 
case histories were examined, and 76 
were chosen from this group to be 
rated on the Elgin Prognostic Scale. 
The 20 patients receiving the lowest 
point totals were selected as being 
most reactive, while those with the 20 
highest scores were considered most 
process. 

Results of the experiment revealed 
that organic patients were signifi- 
cantly different from all other groups 
in CFF threshold and ability to per- 
ceive the spiral aftereffect. 
and reactive schizophrenics did not 
differ from each other on either task, 
butreactive schizophrenicshad higher 
CFF thresholds than normals. These 
results do not indicate demonstrable 


Process 


cortical defect in either process or 
reactive schizophrenia. 


Process-PoorR PREMORBID His- 
TORY VERSUS REACTIVE-GooD 
PREMORBID HISTORY 


Rodnick and Garmezy (1957), dis- 
cussing the problem of motivation in 
schizophrenia, reviewed a number of 
studies in which the Phillips prog- 
nostic scale was used to classify 
schizophrenic patients into two 
groups, good and poor. For example, 
Bleke (1955) hypothesized that pa- 
tients whose prepsychotic life adjust- 
ment was markedly inadequate would 
have greater interferences and so show 
more reminiscence following censure 
than patients whose premorbid his- 
tories were more adequate. 

The subjects were presented with a 
list of 14 neutrally toned nouns pro- 
jected successively on a screen. Each 
subject was required to learn to thése 
words a pattern of pull-push move- 
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ments of a switch lever. For half the 
subjects in each group learning tock 
place under a punishment condition, 
while the remaining subjects were 
tested under a reward condition. 
The subjects consisted of 40 normals, 
20 poor premorbid schizophrenics, 
and 20 good premorbid schizophren- 
ics. The results confirmed the hy- 
pothesis. 

A reanalysis of Dunn’s (1954) data 
indicated that a poor premorbid 
group showed discrimination deficits 
when confronted with a scene depict- 
ing a mother and a young boy being 
scolded, but good premorbid and 
normal subjects did not show this 
deficit. 

Mallet (1956) found that poor pre- 
morbid subjects in a memory task for 
verbal materials showed significantly 
poorer retention of hostile and non- 
hostile thematic contents than did 
good premorbid and normal subjects. 
Harris (1955) has found that in con- 
trast to goods and normals poor 
premorbids have more highly deviant 
maternal attitudes. They attribute 
more rejective attitudes to their 
mothers, and are less able to critically 
evaluate their mothers. Harris (1957) 
also found differences among the 
groups in the size estimation of 
mother-child pictures. The poors 
significantly overestimated, while the 
goods underestimated, and the nor- 
mals made no size error. 

Rodnick and Garmezy (1957) re- 
ported a study using Osgood’s (1952) 
semantic differential techniques in 
which six goods and six poors rated 20 
concepts on each of nine scales se- 
lected on the basis of high loadings on 
the evaluative, potency, and activity 
factors. Good and poor groups dif- 
fered primarily on potency and ac- 
tivity factors. The poors described 
words with negative value, as more 
powerful and active. The goods could 
discriminate among concepts, but the 
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poors tended to see most concepts as 
powerful and active. 

Rodnick and Garmezy (1957) also 
investigated differences in authority 
roles in the family during adolescence 
in good and poor premorbid patients. 
While results were tentative at that 


time, they suggested that the mothers. 


of poor premorbid patients were per- 
ceived as having been more dominat- 
ing, restrictive, and powerful, while 
the fathers appeared ineffectual. The 
pattern was reversed in the good pre- 
morbid patients. 

Alvarez (1957) found significantly 
greater preference decrements to 
censured stimuli by poor premorbid 
patients. This result was consistent 
with the results of Bleke’s (1955) and 
Zahn’s (1959) observations of re- 
versal patterns of movement of a 
switch lever following censure. These 
experiments suggested an increased 
sensitivity of the poor premorbid 
schizophrenic patient to a threaten- 
ing environment. 

These studies reported by Rodnick 
and Garmezy (1957) indicated that it 
was possible, using the Phillips scale, 
to effectively dichotomize schizo- 
phrenic patients. However, the 
Phillips scale had predictive validity 
only when applied to male patients. 
Within this form of reference it was 
also possible to demonstrate differ- 
ences between goods and poors in 
response to censure, and in percep- 
tion of familial figures. Variability in 
the results of schizophrenic perform- 
ance was considerably reduced by 
dichotomizing the patients, but it was 
often impossible to detect significant 
differences between the performance 
of good premorbid schizophrenics 
and normals. Rodnick and Garmezy 
(1957) suggest that the results be 
considered as preliminary findings 
pending further corroboration, though 
providing support for the concept of 
premorbid groups of schizophrenics 
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differing in certain psychological di- 
mensions. 


PRocEss-REACTIVE EMPIRICAL- 
THEORETICAL FORMULATIONS 

Fine and Zimet (1959; Zimet & 
Fine, 1959) used the same population 
employed by Kantor et al. (1953) 
and the same criteria for distinguish- 
ing the process and reactive patients. 
For this study only those cases were 
included where there was complete 
agreement among the judges as to the 
category of schizophrenia. They 
studied the level of perceptual or- 
ganization of the patients as shown 
on their Rorschach records. The 
process group was found to have 
significantly more immature, regres- 
sive perceptions, while the reactive 
group gave more mature and more 
highly organized The 
findings indicated that archaic and 
impulse-ridden materials break 
through more freely in process schizo- 
phrenia, and that there is less ego 
control over the production of more 
regressive fantasies. Zimet and Fine 
(1959) speculated that process schizo- 
phrenia mirrors oral deprivation of 
early ego impoverishment, so that 
either regression or fixation to an 
earlier developmental stage is re- 
flected in his perceptual organization. 
In contrast, it is possible that the re- 
active schizophrenic’s ego weakness 
occurs at a later stage in psychosexual 
development, and any one event may 
reactivate the early conflict. 

An amplification of the process- 
reactive formation has been suggested 
by Kantor and Winder (1959). They 
hypothesized that schizophrenia can 


responses. 


be understood as a series of responses 
reflecting the stage of development in 
the patient’s life at which emotional 
support was severely deficient. Schiz- 
ophrenia can be quantitatively de- 
picted in terms of the level in life to 
which the schizophrenic has regressed, 
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and beyond which development was 
severely distorted because of disturb- 
ing life circumstances. The earlier in 
developmental history that severe 
stress occurs, the more damaging the 
effect on subsequent interpersonal 
relationships. Sullivan (1947) sug- 
gested five stages in the development 
of social maturity: empathic, proto- 
taxic, parataxic, autistic, and syn- 
taxic. The most malignant schizo- 
phrenics are those who were severely 
traumatized in the empathic stage of 
development when all experience is 
unconnected, there is no symbolism, 
and functioning is at an elementary 
biological level. The schizophrenic 
personality originating at this stage 
may shéw many signs of organic 
dysfunction. Prognosis will be most 
unfavorable, and delusional forma- 
tion will tend to be profound. 

In view of the primitive symbolic 
conduct and the lack of a self-concept 
in the prototaxic stage, the schizo- 


phrenic personality referable to this 
stage will be characterized by magi- 
cal thinking and disturbed communi- 


cation. The delusion of adoption 
often occurs. However, these patients 
are more coherent than those of the 
previous level. 

The parataxic schizophrenic state 
involves the inability of the self- 
system to prevent dissociation. The 
autonomy of the dissociations result 
in the patient’s fear of uncontrollable 
inward processes. Schizophrenic 
symptoms appear as regressive be- 
havior attempting to protect the self 
and regain security in a threatening 
world. Delusional content usually 
involves world disaster coupled with 
bowel changes. Nihilistic delusions 
are common. While there is evidence 
of a self-system in these patients, 
prognosis remains unfavorable. 

The patient who has regressed to 
the autistic stage, although more 
reality oriented than in the previous 
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stages, is characterized by paranoid 
suspiciousness, hostility, and patho- 
logical defensiveness against inade- 
quacy feelings. A consistent system 
of delusions will be articulated and 
may bring the patient into conflict 
with society. However, prognosis is 
more favorable at this stage than 
previously. 

An individual at the syntaxic level 
has reached concensus with society, 
so that if schizophrenia occurs it will 
be a relatively circumscribed reac- 
tion. Onset will be sudden with 
plausible environmental stresses, and 
prognosis is relatively good. 

Becker (1959) also elaborated on 
the lack of a dichotomy in schizo- 
phrenia. Individual cases spread out 
in such a way that the process syn- 
drome moves into the reactive syn- 
drome, so that the syndromes prob- 
ably identify the end points of a 
dimension of severity. At the process 
end of the continuum the develop- 
ment of personality organization is 
very primitive, or involves severe 
regression. There is a narrowing of 
interests, rigidity of structure, and 
inability to establish normal hetero- 
sexual relationships and independ- 
ence. In contrast, the reactive end of 
the continuum represents a higher 
level of personality differentiation. 
The prepsychotic personality is more 
normal, heterosexual relations are 
better established, and there is greater 
tolerance of environmental stresses. 
The remains of a higher develop- 
mental level are present in regression 
and provide strength for recovery. 

Becker (1959) factor analyzed 
some of the data from his previous 
study (Becker, 1956). The factored 
matrix included a number of back- 
ground variables, the 20 Elgin Prog- 
nostic Scale subscores, and a Ror- 
schach genetic level score (GL) based 
on the first response to each card. 
Seven centroid factors were extracted 
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from the correlation matrix. Factors 
4, 6, and 7 represented intelligence, 
cooperativeness, and marital status 
of parents, respectively. The highest 
loadings on Factor 5 were history of 
mental illness in the family, excellent 
health history, lack of precipitating 
factors, and clouded sensorium. The 
Rorschach GL score and the Elgin 
scales did not load significantly on 
Factors 4 through 7. 

The remaining three factors paral- 
lel the factors Lorr, Wittman, and 
Schanberger (1951) found with 17 of 
the 20 Elgin scales using an oblique 
solution instead of the orthogonal 
solution used in this study. Factor 1 
is called schizophrenic withdrawal, 
loading on defect of interest, insidious 
onset, shut-in personality, long dura- 
tion of psychosis, and lack of pre- 
cipitating factors. At one end this 
factor defines the typical process 


syndrome, while the other end de- 
scribes the typical reactive syndrome. 


The Rorschach GL score loaded —.46 
on Factor 1. 

Factor 2, reality distortion, loads 
on hebephrenic symptoms, bizarre 
delusions, and inadequate affect. 
Rorschach GL score loaded —.64 on 
this factor. Factor 3 loaded on in- 
difference and exclusiveness-stubborn- 
ness. The opposite pole of this factor 
involves insecurity, inferiority, self- 
consciousness, and anxiety.  Ror- 
schach GL score loaded .25 on this 
factor. 

Further analysis indicated that 
when Factors 1 and 2 were plotted 
against each other an oblique rota- 
tion was required, introducing a cor- 
relation of from .60 to .70 between 
schizophrenic withdrawal and reality 
distortion factors. Similar oblique- 
ness was found between Factors 2 
and 3, suggesting the presence of a 
second-order factor. 

However, the sampling of behavior 
manifestations in the Elgin scale 
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overweights the withdrawal factor, 
which gives Factor 1 undue weight 
and biases the direction of a second- 
order factor toward the withdrawal 
factor. Also, it is not possible to ac- 
curately locate second-order factors 
with only seven first-order factors as 
reference points. In addition, sample 
size and related sampling errors 
limited inferences about a second- 
order factor. There is the suggestion, 
however, of the existence of a general 
severity factor, loading primarily 
schizophrenic withdrawal and reality 
distortion. 

The author suggests utilizing the 
evidence from this study to form an 
index of severity of psychosis which 
could be used to make diagnoses with 
prognostic significance. This diag- 
nostic procedure would include factor 
estimates of schizophrenic withdrawal 
and emotional rigidity, based on 
Elgin scale ratings, and reality distor- 
tion, based on the Rorschach GL 
score. 

Garmezy and Rodnick (1959) 
pointed out that despite failure to 
find support for a fundamental bio- 
logical deviation associated with 
schizophrenia (Kety, 1959), the view 
of schizophrenia as a dichotomous 
typology influenced either by somatic 
or psychic factors has continuously 
been advanced. They maintain that 
on the basis of empirical evidence 
there is little support for a process- 
organic versus reactive-psychogenic 
formulation of schizophrenic etiology. 

Reviewing a series of studies using 
the Phillips scale as a dichotomizing 
instrument (Alvarez, 1957; Bleke, 
1955; Dunham, 1959; Dunn, 1954; 
Englehart, 1959; Farina, 1960; 
Garmezy, Stockner, & Clarke, 1959; 
Harris, 1957; Kreinik, 1959; Rodnick 
& Garmezy, 1957; Zahn, 1959) 
Garmezy and Rodnick concluded 
that the results indicate two groups of 
schizophrenic patients differing both 
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in prognostic potential and sensitivity 
to experimental cues. There is an 
interrelationship among the variables 
of premorbid adequacy, differential 
sensitivity to censure, prognosis, and 
types of familial organization. This 
suggests a relationship between vary- 
ing patterns of early experience and 
schizophrenia, though it does not 
embody the acceptance of a given 
position regarding psychological or 
biological antecedents in schizo- 
phrenia. 

Reisman (1960), in an attempt to 
explain the heterogeneous results of 
psychomotor performance in schizo- 
phrenics, suggested that there were 
two groups of schizophrenics, process 
and reactive, differing in motivation. 
The process group was seen as more 
withdrawn and indifferent to their 
performance, and consequently re- 
flecting a psychomotor deficit not 
present in reactives. In order to test 
this hypothesis 36 reactives, 36 proc- 
ess patients, and 36 normals per- 
formed a card-sorting task. The 
groups were distinguished according 
to the criteria of Kantor, Wallner, 
and Winder (1953). On Trial 1 all 
subjects were requested to sort as 
rapidly as possible. Then the sub- 
jects were assigned to one of four 
experimental conditions, with an at- 
tempt made to equate across the 
experimental conditions for age, esti- 
mated IQ, length of hospitalization, 
and initial sorting time. Condition 1 
(FP) involved sorting the cards seven 
more times and if the sort was fast 
the subjects were shown stress- 
arousing photographs. If they sorted 
slowly no photographs were shown. 
Condition 2 (SP) was the reverse of 
this. Condition 3 (FL) and Condi- 
tion 4 (SL) were similar to the first 
two conditions except that a nonrein- 
forcing light was used instead of the 
pictures. After Trial 8 all subjects 
were informed that there would be no 
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more pictures or light, but were asked 
to sort rapidly for three more trials. 
With four conditions on Trials 2 
through 8, 10 subjects from each of 
the three groups participated in each 
of the two picture conditions, while 
eight subjects from each group par- 
ticipated in each of the light condi- 
tions. 

The results indicated that the nor- 
mals performed about the same under 
all conditions. The process group 
under FP sorted as fast as normals, 
but performed slowly under the other 
three conditions, while the reactives 
were slowest under FP but were as 
fast as normals under the other three 
conditions. Within all three groups 
performance under FL did not differ 
significantly from performance under 
SL. Under FL and SL, however, re- 
actives and normals sorted more 
rapidly than the process group. These 
results supported the hypothesis of a 
motivational deficit for process schiz- 
ophrenics. The results also indi- 
cated that the pictures were nega- 
tively reinforcing for the reactives, 
while the process patients were moti- 
vated to see them. This suggested a 
withdrawal differential. The with- 
drawal of the process patients is of 
such duration that supposedly threat- 
ening photographs cause little anxi- 
ety. In contrast, reactive withdrawal 
is motivated by an environment that 
recently became unbearable. Con- 
fronted with pictures representing 
this environment the reactive patient 
experiences anxiety and avoidance. 
However, the results of this experi- 
ment are in contrast to the findings of 
Rodnick and Garmezy (1957) that 
prolonged exposure to social censure 
will result in greater sensitivity to 
that stimulation. 


SUMMARY 


This review of all the research on 
the process-reactive classification of 
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schizophrenia strongly indicates that 
it is possible to divide schizophrenic 
patients into two groups differing in 
prognostic and life-history variables. 
Using such a division it is also pos- 
sible to demonstrate differences be- 
tween the two groups in physiological 
measures and psychological dimen- 
sions. 

The result of such an approach has 
been to clarify many of the hetero- 
genous reactions found in schizo- 
phrenia. It also appears that the 
dichotomy is somewhat artificial and 
really represents end points on a 
continuum of personality organiza- 
tion. The most process patient rep- 
resents the extreme form of personal- 
ity disintegration, while the most 
reactive patient represents the ex- 
treme form of schizophrenic integra- 
tion. The reactions of this type of 
patient are often difficult to distin- 
guish from behavior patterns of nor- 
mal subjects. There does not appear 
to be any significant evidence to sup- 
port the contention of a _process- 
organic versus a reactive-psychogenic 
formulation of schizophrenic etiol- 
ogy. 

It is difficult to decide on the 
most appropriate criteria for selecting 
schizophrenic subjects so as to reduce 
their response variability. Prefer- 
ences are generally found for one of 
three sets of criteria: Kantor, Wallner, 
and Winder’s (1953) items, the Elgin 
Prognostic Scale (1944), or the 
Phillips scale (1953). The criteria of 
Kantor et al. (1953) does not provide 
a quantitative ordering of the vari- 
ables, and is descriptively vague in 
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several dimensions as well as depend- 
ing upon life history material which 
is not always available. While the 
Elgin scale does provide a quantita- 
tive approach, it also has the disad- 
vantages of descriptive vagueness 
and excessive dependence upon life 
history material. The Phillips scale 
eliminates some of these difficulties, 
but its validity is limited to the ade- 
quacy or inadequacy of social-sexual 
premorbid adjustment. The need for 
more feasible criteria may be met by 
the factor analysis of pertinent vari- 
ables to obtain a meaningful severity 
index (Becker, 1959), or by using 
rating scales in which the patient 
verbally supplies the necessary in- 
formation. An example of the latter 
is the Ego Strength scale (Barron, 
1953), recently utilized in distinguish- 
ing two polar constellations of schizo- 
phrenia; a process type with poor 
prognosis and grossly impaired ab- 
stract ability, and a reactive type 
characterized by good prognosis and 
slight abstractive impairment (Her- 
ron, in press). 

This need for more efficient differ- 
entiating criteria mitigates some of 
the significance of present findings 
using the process-reactive dimension. 
Nonetheless, the process-reactive re- 
search up to this time has succeeded 
in explaining schizophrenic hetero- 
geneity in a more meaningful manner 
than previous interpretations adher- 
ing to various symptom pictures and 
diagnostic subtypes. Consequently, 
there appears to be definite value in 
utilizing the process-reactive classifi- 
cation of schizophrenia. 
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Hypnotic research can be broadly 
characterized as having either an in- 
trinsic or instrumental orientation. 
Intrinsically oriented research is con- 
cerned with the phenomena and 
nature of hypnosis itself whereas in- 
strumentally oriented research util- 
izes hypnosis to produce some condi- 
tion which is the object of study, e.g., 
personality alteration, psychopathol- 
ogy. Although both research orienta- 
tions present difficult methodological 
problems, this communication will 
limit itself to problems associated 
with the instrumental use of hypno- 
sis. 

Despite its ability to command en- 
during interest, instrumental hypnot- 
ic research has remained relatively 
inconsequential and isolated. One of 
the principle reasons for this state of 
affairs is the lack of criteria for de- 
termining the relevance of hypnoti- 
cally induced behavior to clinical or 
natural behavior. In the absence of 
adequate criteria, the data of in- 
strumental hypnotic research tend to 
be either rejected or consigned to the 
limbo of ambiguity. Adams (1957), 
in his review of laboratory studies of 
behavior without awareness, ex- 
cluded studies involving posthypnot- 
ic suggestion, automatic writing, 
extrasensory perception, and proc- 
esses of which the subject is unaware. 
A paralyzing caution was displayed 
by Ainsworth (1954) in her review of 
Rorschach validation research: 
Hypnosis provides another method for 
artifically altering the state of the subject 
while undergoing the Rorschach examination, 


although hypnotic studies are open to the 
question of whether the hypnotically induced 


344 


state is comparable enough to the “genuine” 
state to provide validation evidence (p. 480). 


Most reviewers, however, do not 
even bother to mention the exclusion 
of hypnotic research. At the risk of 
being charitable, it is likely that re- 
jecting attitudes toward instru- 
mental hypnotic research arise more 
from the lack of criteria for determin- 
ing relevance than from prejudice. In 
lieu of such criteria, most investiga- 
tors have approached this issue by 
ignoring it or by assuming that the 
induced behavior is equivalent in all 
respects to its natural counterpart. 
Phenotypic identity, however, does 


not necessarily imply genotypic iden- 


tity; i.e., the fact that behavior 
similar to anxiety can be produced by 
hyposis does not mean that the 
mechanisms of hypnosis are the same 
or similar to the processes underlying 
clinical anxiety. Weitzenhoffer (1953) 
has pointed out that hypnotically 
induced phenomena _ resembling 
psychodynamic manifestations are 
apt to lack affective tone. He also 
recognized the importance of induc- 
ing an appropriate genotype by his 
assertion that affective tone is most 
apt to be absent ‘“‘when the sugges- 
tions are aimed at directly bringing 
about the overt manifestations rather 
than creating the type of factors 
normally responsible for these’’ (p. 
217). 

The topic of hypnotically induced 
psychopathology will serve as the 
focus of the inquiry because it high- 
lights both the methodological and 
conceptual problems involved in the 
laboratory investigation of hypnot- 
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ically induced conditions. Such a 
focus achieves enhanced significance 
because the genotypic-phenotypic 
relationships that constitute psycho- 
pathology represent one of the cen- 
tral problems in most psychoanalyt- 
ically oriented theories of personal- 
ity. 


A Paradigm for the Hypnotic Induc- 
tion of Psychopathology 


A paradigm for demonstrating 
valid psychopathology must include 
a procedure for separating the 
mechanisms of suggestion from the 
mechanisms of pathogenic psycho- 
dynamics. Although it is doubtful 
that the mechanisms of hypnotic 
suggestion are similar to the mecha- 
nisms of pathogenic psychodynamics, 
clinical experience with hypnosis 
(Eisenbud, 1937; Rosen, 1953) indi- 
cates that hypnotic suggestion can 
set in motion nonsuggested patho- 
genic psychodynamics and observable 
psychopathology. Thus, hypnotic 
suggestion should be used only to 
induce a process that, under certain 
specifiable conditions, is theoretically 
capable of producing pathogenic 
psychodynamics and psychopathol- 
ogy. The hypnotically induced 
process defines the genotype, and the 
behavioral outcome defines. the 
phenotype. The genotype is defined 
operationally by the statements in 
the hypnotic suggestions; the pheno- 
type is defined operationally by a 
description of the subject’s overt be- 
havior. The description of the 
phenotype is considered to be opera- 
tionally valid clinical psychopathol- 
ogy only if it satisfies the defining 
criteria of a given classification of 
psychopathology. In this way, the 
investigator can operationally tie 
down the genotype, or psycho- 
dynamics, that produces the observed 
psychopathology instead of having to 
relie upon the uncertainties of clinical 
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inference in regard to natural psycho- 
pathology. 

The production of operationally 
valid clinical psychopathology by a 
hypnotically induced process permits 
the inference that the genotype is 
adequate, which in turn is supporting 
evidence for the theory from which 
the genotype is derived. If the geno- 
type does not produce psychopathol- 
ogy, there are two_ interpretive 
alternatives available: the genotype 
is inadequate and the theory from 
which it is derived is not supported, 
or the conditions of the experiment 
were unfavorable for an adequate 
test of the theory. 

The foregoing considerations sug- 
gest four principles, or criteria, that 
should guide research in this area. 
First, the induced process must in no 
way include cues as to how the ex- 
perimenter expects the subject to 
respond in any other respect. Orne 
(1959) has demonstrated convinc- 
ingly the sensitivity of hypnotized 
subjects to the expectations of the 
experimenter and the “demand” 
characteristics of the experimental 
design. Second, the induced process 
must produce other processes and 
behavior; that is, it must be re- 
sponse-producing. Third, some of 
these responses must satisfy the de- 
fining criteria for inclusion in some 
classification of psychopathology. 
Finally, as Orne (1959) suggests, some 
of the subjects must be asked by a 
co-experimenter, unknown to the ex- 
perimenter, to fake hypnosis in order 
to determine the demand character- 
istics of the research. 

REVIEW OF RELEVANT RESEARCH 

Research in the area of hypnoti- 
cally induced psychopathology falls 
into three categories. 


Direct Suggestion 


In studies of this type, suggestion 
is used to produce a given response 
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which is considered to be clinically 
meaningful. By suggestion the ex- 
perimenter reproduces in the sub- 
ject a specific mood, attitude, affect, 
or symptom. Although most of this 
research has been reviewed elsewhere 
(Weitzenhoffer, 1953), a recent in- 
vestigation by Levitt, den Breeijen, 
and Persky (1960) will be presented 
and discussed in detail because it is a 
particularly good example of the in- 
herent defects in this popular ap- 
proach. The procedure is straight- 
forward: the subject is made to feel 
anxious by listening to a taped pres- 
entation of a series of somewhat 
repetitious phrases of increasing emo- 
tional intensity containing a variety 
of synonyms for the emotions of 
anxiety and fear. 

A deliberate effort was made to 
produce anxiety in ‘“‘pure’’ form be- 
cause, under natural conditions, 
there is usually an admixture of 
anxiety, depression, hostility, etc. 


Their attempt to produce anxiety in 
pure form is, therefore, an interfer- 
ence with the idiosyncratic phenotype 
and entirely destroys its clinical sig- 


nificance. Due to its covert intra- 
psychic origins, anxiety is not experi- 
enced in the same way by everyone, 
nor is its presence always detected 
and _ identified. Moreover, their 
emphasis upon such words as “‘fear,”’ 
“dread,” “‘apprehension,”’ and 
“‘panic’’ may well be reproducing 
responses to external threat rather 
than generating responses to an un- 
known internal threat, which is a 
distinction often used to differentiate 
anxiety from fear. The affect of anx- 
iety is even more complicated than 
they have observed because it can be 
managed in different ways. It may 
be managed defensively by hostile 
or depressive reactions, projected as 
in a phobia, or converted into somat- 
ic processes. A study which ignores 
the personal equation in the hypnotic 
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production of psychopathology 
should be designated as an experi- 
mental analogue of the clinical be- 
havior in question. It is scientifically 
legitimate to endeavor to abstract 
and to purify emotions as they have 
done, but these, by definition, are 
not clinical phenomena; it is con- 
trolled, laboratory research which 
purposely creates conditions to elimi- 
nate the clinical “taint’’ of the data. 

In terms of the paradigm, the 
major shortcoming of direct sugges- 
tion is the identity between the geno- 
type and phenotype. The subject 
merely carries out the suggestions 
that are given to him; the instructions 
specify the behavior. This also means 
that direct suggestion is not response- 
producing in the sense that other 
processes are set in motion which lead 
to the behavioral outcome. In order 
to satisfy the paradigm, a process 
must be induced that has the capacity 
to trigger off a chain of events that 
eventuates in psychopathology. The 
nature of the genotype and the condi- 
tions under which it is induced will 
reflect some theory about personality 
and psychopathology. In this sense, 
the paradigm is a procedure for test- 
ing theories. Direct suggestion tests 
nothing but itself. 


The Induction of Artificial Conflicts 


In studies of this type (Bobbitt, 
1958; Counts & Mensh, 1950; Erick- 
son, 1944; Huston, Shakow, & Erick- 
son, 1934; Luria, 1932), the subject is 
provided with a paramnesia regard- 
ing a situation to which he has a dis- 
tressing emotional reaction, such as 
hostility or remorse. In one way or 
another, the subject is usually told 
that he will not remember anything 
about the experience posthypnoti- 
cally, but, nevertheless, it will be 
disturbing to him. Although the in- 
duced experiences are intended to be 
perceived as “real’’ rather than con- 
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trived, the paradigm is not satisfied: 
the subject is told that he will not re- 
call the paramnesia or that he will 
recall it to a certain degree; further- 
more, he is told that the paramnesia 
will be a source of posthypnotic dis- 
turbance. The design of these studies 
is also incomplete because of the lack 
of control subjects who are asked to 
fake hypnosis, and the significance of 
the results is vitiated by the relatively 
weak disturbances that were pro- 
duced. 

Wohlberg (1947) reported a pro- 
cedure which seems to approach 
closely the paradigm. Instead of 
implanting a paramnesia, he sug- 
gested an impulse that would pro- 
duce conflict in the waking state. 
His instructions were as follows: 

When you awaken you will find next to you 
a bar of chocolate. You will have a desire to 
eat the chocolate that will be so intense that it 
will be impossible to resist the craving. At the 
same time you will feel that the chocolate does 
not belong to you and that to eat it would be 
very wrong and very bad. You will have no 
memory of these suggestions when you 


awaken, but you will, nevertheless, react to 
them (p. 337). 


The distinctive aspect of his in- 
structions is the posthypnotic sug- 
gestion of an overwhelming impulse 
which is rendered anxiety-producing 
by pitting it against conscience. 


Although his subjects were in- 
structed to perceive the induced im- 
pulse in terms of conscience, they 
were not instructed to develop symp- 
toms. Accordingly, it is of great in- 
terest that the procedure spontane- 
ously produced both somatic and 
psychological reactions, which in- 
cluded such marked symptoms as diz- 
ziness, tachycardia, and a negative 
hallucination. Since his procedure 
approximates closely the paradigm, 
the posthypnotic psychopathology 
may very well be a valid clinical phe- 
nomenon. If he had used the proper 
control subjects, a more positive 
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statement could be made. In order 
for his instructions to be perfect, the 
subjects should not have been told 
how to perceive the impulse nor 
should he have suggested an amnesia. 
The impulse should spontaneously 
generate all subjects’ reactions. 

An investigation by Reyher (1961) 
also approaches the paradigm. Un- 
der deep hypnosis the subjects were 
given a hallucinatory experience that 
generated intense feelings of hositility 
toward a given individual. The in- 
structions were as follows: 

Now listen carefully. After I awaken you, 
you will not be able to remember anything 
about this session. However, anything that 
comes into your conscious mind that is 
associated with this experience [specific 
classes of words are mentioned] will stir up 
overwhelming feelings of hate. If these feel- 
ings break into you will 
realize that it is the person who owns these 
papers (which are within arm’s reach] that you 
hate, and you will have an overwhelming urge 
to tear them up 


consciousness, 


Posthypnotic conflict was created 
by presenting trachistoscopically crit- 
ical and neutral pairs of words until 
one word of each pair was recognized. 
Ideally, the instructions should not 
have included such an ambiguous 
word as “‘if,’”” nor should an amnesia 
have been produced, even though 
the conflict-producing impulse was 
to be experienced and acted upon 
psothypnotically. Nevertheless, the 
procedure produced much _ psycho- 
pathology. The recognition of con- 
flict words produced such reactions as 
urticaria, tachycardia, gastric dis- 
tress, headache, flushing, sweating, 
tics, tremors, and such psychological 
reactions as anxiety, apprehension, 
dissociation, and derivatives. One of 
the most important findings was a 
correlation of .74 between the degree 
repression of the induced conflict and 
the proportion of somatic complaints. 

Other than the use of the ambigu- 
ous word if in the hypnotic instruc- 
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tions and the suggested amnesia, 
this study satisfies the paradigm. The 
induced hostility contained no clues 
in relation to the occurrences of 
psychopathology, many symptoms 
were produced and proper control 
subjects did not report symptoms. 
Accordingly, it is reasonable to con- 
clude that the psychopathology was 
genuine. Spontaneous symptomatic 
reactions to hypnotically induced 
processes have been summarized by 
Weitzenhoffer (1953). Although 
these case reports of idiosyncratic 
reactions to hypnotic procedures are 
illuminating, they do not lend them- 
selves to laboratory investigation 
because of their unreliable and un- 
controlled nature. 


The Activation of Natural Conflicts 


Two studies fall into this category. 
Gordon (1959) instructed his sub- 
jects to bring to mind episodes in- 
volving conflict with parents. The 


subjects were given differing degrees 


of posthypnotic awareness of the 
episodes. No symptoms were re- 
ported. Although this approach 
gains clinical significance by permit- 
ting the subject to dwell upon his own 
emotionalized experiences, the in- 
vestigation is difficult to interpet be- 
cause of the posthypnotic suggestion 
to achieve a given degree of aware- 
ness. Utilizing the subject’s own 
conflicts is a promising method for 
producing psychopathology and 
eliminates almost entirely the criti- 
cism of artificiality, provided that the 
subject is not instructed how to re- 
act posthypnotically. Nevertheless, 
it must be shown that such reliving 
of past experiences is anxiety-pro- 
ducing. Since no symptoms were re- 
ported, the conflicts were probably 
not intense enough to generate symp- 
toms or other phenotypic mani- 
festations of psychopathology. 
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An investigation by Reyher and 
Shoemaker (1961) is also pertinent. 
TAT cards were utilized as stimuli 
for producing age regressions and the 
reliving of important emotionalized 
experiences. Ten TAT cards were 
selected randomly to be con- 
flictual or neutral for four subjects 
who were capable of deep hypnosis. 

In order to create a conflict to each 
of five cards, hypnotized subjects 
were told, as they looked at each 
card, that disturbing emotions would 
be aroused. The subjects were then 
regressed to a time when these emo- 
tions were difficult to manage. In 
order to create nonconflictual or 
neutral reactions to the other five 
cards, the instructions were the same 
as above except that the emotions 
were nondisturbing. A posthypnotic 
amnesia was suggested and, in addi- 
tion, the subject was told that the 
cards would stir up the same feelings 
as before, and that he would reveal 
them directly or indirectly in the 
stories that he would be asked to tell. 
In the waking state, the subject was 
given the same cards, by another 
experimenter, with standard _in- 
structions. 

Although no symptoms were re- 
ported by the subjects marked differ- 
ences were observed between the con- 
tent of the hypnotic reactions and 
the waking stories. The conflict- 
cards were generally associated 
with more alterations than were the 
neutral-cards; however, on some occa- 
sions the latter also were associated 
with marked differences. These differ- 
ences for both kinds of cards almost 
always reflected unresolved conflicts 
and helped guide psychotherapy. 

Unfortunately the paradigm had 
not been formulated before this in- 
vestigation was carried out. The in- 
duced processes were not kept dis- 
tinct from suggested phenotypic be- 
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havior, as the instructions do not 
permit the induced process to gen- 
erate spontaneously all of the sub- 
ject’s behavior: the subject is in- 
structed to tell a story related di- 
rectly or indirectly to the induced 
process. Since, in broad terms, the 
subject is told how to respond, it is 
impossible to determine what re- 
sponses were generated by the in- 
duced process and what responses 
were a direct reflection of the hypnot- 
ic suggestion. In order to satisfy 
the paradigm, the instructions should 
state that the posthypnotic adminis- 
tration of each TAT card will stir up 
the same thoughts and feelings as it 
did during the hypnosis and that 
these reactions will become over- 
whelmingly intense. 

Subsequent experience has shown 
that the indirect versus direct option 
can be dropped, because even neurot- 
ic subjects are not easily over- 
whelmed by hypnotically induced 
conflict. The subject’s defensive 
organization usually does a good job 
in regulating hypnotically induced 
stress; nevertheless, the experimenter 
must be constantly alert for signs of 
a serious breakdown in ego functions 
when the subject is experiencing 
distress. 

The absence of overt psychopathol- 
ogy may be attributed to the fact 
that the impulse was not suggested to 
be overwhelming and that the sub- 
ject was given an option of how to 
respond. Despite these inadequacies 
in design for the production of psycho- 
pathology, the hypnotic reactions 
that produced the most alterations in 
the waking stories were congruent 
with central areas of conflict described 
in earlier psychodiagnostic impres- 
sions but which had not yet come up 
in psychotherapy. This observation 
indicates that there were significant 
psychodynamic reactions involved 
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and that this procedure might be very 
productive of psychopathology, if 
utilized properly. 

There is reason to believe that the 
conflict-producing potential of the 
hypnotic reactions can be intensified. 
It was observed that in subsequent 
psychotherapeutic sessions with the 
subjects, ‘‘deeper’’ aspects of the hyp- 
notic reactions which were markedly 
changed in the waking state often 
could be uncovered by the induction 
of successive dreams about the mate- 
rial ‘‘behind’’ them. By telling the 
subject that he would have a dream 
about the emotions and thoughts 
behind his hypnotic experience, the 
material often became progressively 
more clearly represented until an 
abreaction of emotionally charged 
experiences took place. This ma- 
terial is valuable from the point of 
view of psychotherapy and, for re- 
search purposes, may be used to pro- 
duce anxiety and psychopathology in 
the posthypnotic state. 

These procedures would seem to 
have the greatest potential for creat- 
ing psychopathology, but they also 
have the disadvantage that they 
should be restricted to subjects who 
are in psychotherapy or those who are 
waiting to begin. The experimenter 
must be in a position to help the sub- 
ject work out adverse reactions if 
they should occur; otherwise, he 
places himself in an untenable ethi- 
cal and professional position. 


DISCUSSION 


All previous research in the hyp- 
notic induction of psychopathology in 


some way has interfered with the 
spontaneous reactions of the subjects 
by instructing them how to react to 
the induced processes; consequently, 
the interpretative significance of the 
subjects’ reactions is reduced in pro- 
portion to the extent of the interfer- 
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ence. If the induced processes have 
no intrinsic capacity for spontane- 
ously producing alterations in be- 
havior—such as distortions, repres- 
sion, psychosomatic reactions, etc.— 
then the induced processes have no 
real clinical significance, and the 
imposed experimental reactions are 
merely hypnotic suggestions to be 
carried out. 

Two methods derived from psycho- 
analytic theory were presented in 
which emotions can be linked with 
anxiety and the production of psycho- 
pathology: one artificial and the 
other natural. First, an emotion, 
such as hate, is brought to over- 
whelming intensity by a _ set of 
appropriate hallucinatory experi- 
ences (paramnesia). Since the intense 
hate would pose a vital threat to the 
subject’s security under the cir- 


cumstances of the waking state, it is 
hypothesized that its activation by a 
posthypnotic signal creates the con- 


ditions for conflict, anxiety, and 
psychopathology. The posthypnotic 
intensification of hostility activates 
the subject’s traditional defenses 
against hostility of such intensity; 
that is, there is a danger point in the 
intensity of hostility beyond which 
the subject would lose control and, 
thereby, subject himself to the re- 
taliation of the environment. The 
necessary controls and defenses are 
learned early in life and are triggered 
off in the posthypnotic state at the 
time the relevant posthypnotic signal 
is given. The second method is the 
same as the first except that the sub- 
ject’s own idiosyncratic conflicts are 
activated by the posthypnotic signal. 
His defenses against anxiety-produc- 
ing processes are pressed beyond 
their usual limits, and anxiety and 
psychopathology are produced. 

The paradigm can be utilized to 
test theories regarding specific kinds 
of psychopathology and almost any 
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alteration in personality. For ex- 
ample, if a state of depression is de- 
sired, there are at least two clinical 
models from which to choose: the 
subject is made to believe that the 
objects and symbols for the grati- 
fication of his important emotional 
needs are no longer available, and 
in the waking state, he is given a 
paramnesia consistent with these 
events; if clinical, reactive depres- 
sion is desired, hostility is induced 
toward loved ones in subjects who, 
on the basis of previous knowledge, 
turn this kind of hostility inwards. 
More directly, it may be possible to 
condition subjects who have a ten- 
dency in this direction to react to 
their own hostility by turning it in- 
wards, and then to produce a param- 
nesia which involves a_ situation 
that normally would lead to intense 
hostility. The subject is given a post- 
hypnotic signal for this hostility to 
become intense and conscious. Only 
those subjects are retained for study 
who do not achieve awareness of their 
hostility, despite the posthypnotic 
suggestion to do so. 

In regard to the induction of a 
paramnesia or the implantation of an 
impulse that ordinarily is foreign to 
the subjects, there is reason to believe 
that a suggested amnesia for the hyp- 
notic session may be necessary. If an 
amnesia is not suggested, it may be 
that enough fragments of the session 
will be recalled by the subject for 
him to realize that the experimenter 
had implanted something, and the 
growth of subsequent insight into the 
true nature of the experience would 
render the conflict innocuous. A sug- 
gested amnesia would prevent the 
subject from acquiring insight and 
preserve the conflict. 

This also may be true for the 
activation of the subject’s own con- 
flicts. The fact that the experimenter 
succeeds in getting a hypnotized sub- 
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ject to become aware of conflictual 
material indicates that repression 
already had started to break down 
and that the subject may find it 
rélatively easy to become aware of 
the material in the posthypnotic 
state. The hypnotic uncovering of 
conflictual material in patients un- 
dergoing psychotherapy supports this 
observation. When potent repressed 
material becomes represented in hyp- 
nosis, this indicates that the forces 
maintaining repression have been 
growing progressively weaker. This 
is illustrated by the fact that only 
after many months of intensive 
psychotherapy, including hypno- 
analytic techniques, do the most sig- 
nificant repressions begin to lift. 


They begin to break down because 
the way has been prepared by the 
progressive development of a more 
secure relationship with the psycho- 
therapist and the prior achievement 
of insight into less intense facets of 


basic conflicts. Most psychothera- 
pists who are experienced with hyp- 
noanalytic techniques realize that 
while hypnosis is not an immediate 
and direct route to the uncovering of 
repressed material, it is certainly 
more rapid and more direct than most 
other methods. Once something has 
been uncovered in hypnosis, subse- 
quent insight in the waking state is 
usually attained readily. It may be 
that a posthypnotic amnesia rein- 
forces repressive forces and thereby 
preserves the capacity of the induced 
dynamics to produce psychopathol- 
ogy. 

There is some evidence that the 
relationship between the _ experi- 
menter and the subject is a significant 
factor in successfully inducing con- 
flicts. In an unpublished study, the 
author was able to replicate the re- 
sults of an earlier study (Reyher, 
1961) which produced somatic and 
psychological reactions to the post- 
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hypnotic stimulation of hypnotically 
induced conflict. However, an assist- 
ant using the same procedure could 
produce only symptoms of a rela- 
tively mild degree. Other than the 
different experimenters, there was one 
obvious difference in the preparation 
of the subjects that might account for 
this discrepancy. The subjects who 
were used by the assistant were hyp- 
notized and brought to a deep trance 
bysomeoneelse. Theassistant merely 
saw them for one brief session before 
the experimental session in order to 
establish the depth of the trance. In 
contrast, the senior experimenter had 
begun with naive subjects and 
brought them to a deep level of 
hypnosis himself in the course of 
three or four sessions. By the time 
the subjects were ready for the experi- 
mental sessions, they seemed to be 
quite at ease and trusting of the ex- 
perimenter. It is reasonable to 
hypothesize that an unfamiliar ex- 
perimenter would arouse some anx- 
iety and defensive behavior that 
would interfere with the effect of any 
suggestions of a personal nature. 

No matter what is to be induced 
hypnotically, it is wise to present 
the instructions in the passive voice. 
The use of the passive voice reduces 
the possibility that the subject may 
act out a role to please the experi- 
menter. More specifically, the subject 
should not be instructed to carry out 
suggestions but he should be informed 
that he will be acted upon by some- 
thing or that he is going to experience 
something. Not only does the active 
voice promote the expectation that 
the subject should do something, but 
it also enhances volitional, adaptive 
processes which render the hypnotic 
behavior similar to waking behavior. 
Thus, the instructions should mini- 
mize the role of volitional processes 
and maximize the role of nonvoli- 
tional processes. 
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